diff --git a/DmitriyVkhtiuk_FT/README.md b/DmitriyVkhtiuk_FT/README.md new file mode 100644 index 00000000..1a4d682d --- /dev/null +++ b/DmitriyVkhtiuk_FT/README.md @@ -0,0 +1,331 @@ +## CREATION GOAL +The rss-reader has been created as a final task for graduating EPAM March 2022 Python Foundation course. +My personal goal was to create a rss-reader using OOP and try to think of every issue that can only happen when the script is running. + +## PRODUCT VERSION +Current rss-reader version is 1.5 + +## NOTABLE MOMENTS +All commands listed in this README file may vary according to your system settings. +The most common difference may be in Python calling with console commands, as it can be done with keyword 'python' or 'python3' or 'py' and so on. +Script has been tested on a clean machine with python 3.9.13, so change commands accordingly to your system settings if needed. + +## REQUIREMENTS + +The current version of the product requires Python 3.9 or higher to run. The product hasn't been tested on earlier versions of Python interpreter. + +Please be aware that many Operating Systems come with Python pre-installed but those Python installations may be out of date or even lack some of built-in packages. +If errors caused by such things happen please update your Python interpreter. +- On Windows it is easily done by installation of a setup file downloaded from http://www.python.org. +- A quick guide for updating it on Ubuntu can be found here: +https://linuxize.com/post/how-to-install-python-3-9-on-ubuntu-20-04/ + +Required third-party packages with versions used while developing rss-reader (can be found in requirements.txt file in 'final_task' directory): +``` +setuptools~=62.3.3 *is installed by default in Python 3.4 and higher +arabic-reshaper==2.1.3 +asn1crypto==1.5.1 +beautifulsoup4==4.11.1 +certifi==2022.6.15 +cffi==1.15.0 +charset-normalizer==2.0.12 +click==8.1.3 +colorama==0.4.5 +coverage==6.4.1 +cryptography==37.0.2 +cssselect2==0.6.0 +future==0.18.2 +html5lib==1.1 +idna==3.3 +Pygments==2.12.0 +lxml==4.9.0 +numpy==1.23.0 +oscrypto==1.3.0 +pandas==1.4.3 +Pillow==9.1.1 +pycparser==2.21 +pyHanko==0.13.1 +pyhanko-certvalidator==0.19.5 +PyPDF3==1.0.6 +python-bidi==0.4.2 +python-dateutil==2.8.2 +pytz==2022.1 +pytz-deprecation-shim==0.1.0.post0 +PyYAML==6.0 +qrcode==7.3.1 +reportlab==3.6.10 +requests==2.28.0 +six==1.16.0 +soupsieve==2.3.2.post1 +svglib==1.3.0 +tinycss2==1.1.1 +tqdm==4.64.0 +tzdata==2022.1 +tzlocal==4.2 +uritools==4.0.0 +urllib3==1.26.9 +webencodings==0.5.1 +xhtml2pdf==0.2.8 +``` + +## INSTALLATION +Rss-reader is cross-platform and has been tested on Windows and UNIX-based systems (Ubuntu). + +The current version of the product can be used with or without installation. +By default, the root directory of the script is called 'final_task', and the script directory is 'rss_reader'. +If you change them while using script - adjust your commands appropriately. + +1. Installation as script: +``` +1.1. copy or extract root directory of script (final_task) with all files and subdirectories to a local directory + +1.2. Install and activate virtual environment, to do it, use following commands while being in root directory of the script: + +1.2.1. $ python -m venv venv (for UNIX-based systems) + or + python -m venv venv (for Windows) + +1.2.2. $ source venv/bin/activate (for UNIX-based systems) + or + venv\Scripts\activate.bat or venv\Scripts\activate.ps1 or venv\Scripts\activate (for Windows) + +1.3. Install required packages to your fresh virtual environment, to do it use following command while being in root directory +of the script: + +1.3.1. $ pip install -r requirements.txt (for UNIX-based systems) + or + pip install -r requirements.txt (for Windows) +``` + +2. Installation as a CLI utility: +Installation as a CLI utility requires previous steps 1.1 and 1.2 in part 1 of INSTALLATION section finished +``` +2.1. In your command line terminal navigate to the root directory of the script + +2.2. While in root directory install the script by executing a command: + $ python setup.py install (for UNIX-based systems) + or + python setup.py install (for Windows) +``` + +## USING RSS-READER + +Rss-reader is a Command Line Interface application, which possible options can be shown by calling the script in command line with -h (--help) argument. It can be used in two ways, differing only in a way of calling the script: + +1. Using as a script - can be called in this way: + + +1.1 While being in script directory it can be called using the following command: + + python rss_reader.py [-h] [--limit LIMIT] [--json] [--verbose] [--version] [--date DATE] [--html HTML_PATH] [--pdf PDF_PATH] [--colorized] [source] + +2. Using as a CLI utility + +2.1. While being in root directory of the script, if it was previously installed as a CLI utility, as described above, it can be called using the following command: + + rss_reader [-h] [--limit LIMIT] [--json] [--verbose] [--version] [--date DATE] [--html HTML_PATH] [--pdf PDF_PATH] [--colorized] [source] + +While calling the script in any of the before-mentioned ways, following arguments can be used: +``` + +positional arguments: + source RSS URL + +options: + -h, --help show this help message and exit + --limit LIMIT Limit news topics if this parameter provided. (MUST expect one argument) + --json print result as JSON in stdout. + --verbose Outputs verbose status messages + --version Print version info + --date DATE Fetch news from local cache by date + --html HTML_PATH Convert news to .html + --pdf PDF_PATH Convert news to .pdf + --colorized Outputs in colorize mode + + +source is the positional argument but can be skipped, because it takes nargs="?" and default value for source is None. +in this case, if --date is not specified, script will print the message(error), that user should check the RSS URL, +which is given and try again because request has a bad status code. + + +--help (-h) argument is used to print script's help information (listed above) and exit script +--version argument is used to print script's version and exit script +--verbose argument is used for verbose logging while running the script +--json argument is used to convert news data to JSON format and print JSON to user, its structure is described later +--colorized argument is used to enable colored output mode +--pdf PDF_PATH argument is used to convert news data to PDF format and save as file +--html HTML_PATH argument is used to convert news data to HTML format and save as file + - Both [pdf PDF_PATH] and [html HTML_PATH] arguments result in saving news in correspondent file format. + - [pdf PDF_PATH] and [html HTML_PATH] are optional arguments which take paths to a directory or directory/file where user wants to save converted files. + - If [pdf PDF_PATH] / [html HTML_PATH] is not provided with corresponding argument or not a valid directory or permission denied to change the given directory, converted file will be saved in default "default_dir" directory in current directory. + - If [pdf PDF_PATH] / [html HTML_PATH] is a nonexisting directory, rss-reader will try to create full path to such directory for user + and will use default directory "default_dir" in script directory if creation fails. + - It is recomended to use absolute paths as [pdf PDF_PATH] and [html HTML_PATH] arguments, relative path are processed by script + according to the directory script is called from, and can lead to unexpected paths, though to help handle such + situations save path is printed after file successfully saved. + + +--limit LIMIT argument is used to set the quantity of news, that are shown to the user: + - LIMIT must be a positive integer, which shows how many news are going to be shown from the feed. + - No --limit set, user will get all available news from the feed. + - If --limit <=0, "limit must be a positive number" will be printed and exit. + - A LIMIT, which surpasses number of news in feed, will also result in user getting all available news. + - Passing a non-integer LIMIT will print user-oriented message of error and stop the Script run. + - While limiting news using --limit, the script chooses most recent news according to their publishing date. + +--date DATE argument is used to load previously cached news of the given date (with no need in Internet connection to work). + - Providing --date argument, results in news being sorted by publication time in descending order. + - DATE argument must be a string of digits in %Y%m%d format (YYYYMMDD), e.g. 20210613 (which is 13 June 2021). + - Combination of --date DATE and source arguments result in only news that were fetched from the source will be + loaded from cache. + - If --date DATE argument is given without source, news from all sources from local cache will be loaded. + - If there are no news in cache that match the given date, script will print error message and stop the script run. +``` + +Arguments can be used in combinations, e.g.: +``` +python rss_reader.py +python rss_reader.py --version +python rss_reader.py --version --verbose --limit 1000 https://news.yahoo.com/rss/ +python rss_reader.py --help --verbose --limit 1000 https://news.yahoo.com/rss/ +python rss_reader.py --verbose --limit 3 https://news.yahoo.com/rss/ +python rss_reader.py --verbose --limit 1000 https://news.yahoo.com/rss/ +python rss_reader.py --json --verbose --limit 5 --date 20220617 https://news.yahoo.com/rss/ +python rss_reader.py --json --verbose --limit 5 --date 20220621 +python rss_reader.py --json --verbose --pdf --html --limit 5 --date 20220613 +python rss_reader.py --json --verbose --pdf c:/users/dviht/converted --html output/final --limit 5 --date 20220610 +python rss_reader.py --json --verbose --pdf c:/users/dviht/converted --html output/final --limit 5 https://news.yahoo.com/rss/ + +``` + +## BASIC FUNCTIONALITY + +Class NewsBrain is the base class of rss-reader, gathers required information from rss-feeds and prints a dictionary with valuable data. +Caches gathered news for later use, using pandas lib. +Provides methods for printing data in stdout with option of converting to JSON format. +Provides converters functionality, enable logger, if --verbose argument is specified. +Reformat news dates in cache to provide easy search for news, which date was specified in --date argument. + + +### JSON OUTPUT STRUCTURE +During runtime, the script converts gathered news data into JSON when script is used with --json argument. +The number of news in JSON object is affected by --limit argument if it is provided. +``` + { + "index_of_new" : { + "Source": "RSS URL of Feed", + "Feed": "Title of Feed", + "Title":"Title of article", + "Date": "pubDate of article", + "Link": "Link to the article" + "Description": "description of article", + "Image": : "article image" + + } +} +``` +### CACHING +Cache location differing in a way of calling the script: + +1. Using as a script: + +1.1. + cache.csv file with cache appears in the script directory. + +2. Using as a CLI utility: + +2.1. + Cache is saved in some_path/venv/lib/site-packages/rss_reader-?-py?.egg/rss_reader/cache.csv. + +News is stored in the csv file and processed using pandas lib. +If --date argument is specified and is valid, dates of all cached news will be reformatted in %Y%m%d format to make search for news with specified pubDate really easy. +If some dates in cache can not be reformatted, they will be ignored and not reformatted and message "can't reformat the date {problem date}" will be printed. +``` +Source,Feed,Title,Date,Link,Description,Image +https://news.yahoo.com/rss/,Yahoo News - Latest News & Headlines,Officials: Georgia man sentenced to die kills self in prison,2022-06-27T12:50:43Z,https://news.yahoo.com/officials-georgia-man-sentenced-die-125043110.html,"JACKSON, Ga. (AP) — A Georgia man who was recently sentenced to death in the killings of two corrections officers during an escape attempt five years ago has died in prison of an apparent suicide, corrections officials said.Prison guards found Ricky Dubose unresponsive in his cell at the Georgia Diagnostic and Classification Prison in Jackson around 4:45 p.m. Sunday, according to a Department of Corrections news release. The guards called for medical help and began rendering aid. The coroner at the prison declared Dubose dead at 5:56 p.m.",https://s.yimg.com/uu/api/res/1.2/zEeBoPLQVzw1u2VjZt.THA--~B/aD03NDk7dz0xMDAwO2FwcGlkPXl0YWNoeW9u/https://media.zenfs.com/en/ap.org/5cf8e923a8d9dec8758480785184f376 +https://news.yahoo.com/rss/,Yahoo News - Latest News & Headlines,Chinese father breaks down after son he tutored daily for a year scores a 6/100 on math exam,2022-06-28T23:18:34Z,https://news.yahoo.com/chinese-father-breaks-down-son-231834062.html,"A Chinese father who reportedly tutored his son daily for a year went viral for bursting into tears after his son scored six out of 100 points on a math exam.The child’s parents from Zhengzhou, Henan Province, received his test score on June 23. Upon learning that their son had only received six points for his final math test, the father burst into tears, as seen in a video posted to Weibo by Qilu Evening News. ",https://s.yimg.com/uu/api/res/1.2/JUWswwp8z.axCjm0RqxLoQ--~B/aD00MjU7dz04MDA7YXBwaWQ9eXRhY2h5b24-/https://media.zenfs.com/en/nextshark_articles_509/1b639cbcb8324799b67404180a9fddcd +``` + +### DEFAULT OUTPUT STRUCTURE +- With source argument provided the Feed tags are formed from the feed information in news dictionary. +- If script is used with --date DATE argument and no source argument, news dictionary is formed from pandas.DataFrame with reformatted dates in %Y%m%d format. +- If --json argument is provided the print is done in form of before-mentioned JSON object. +- If --json is not provided, the script by default makes a print of news of the following structure (the number +of news printed is affected by --limit argument if it is provided): +``` + + Source: "RSS URL of Feed", + Feed: "Title of Feed", + Title:"Title of article", + Date: "pubDate of article", + Link: "Link to the article" + Description: "Description of article", + Image: : "article image" + +``` + + + +### CONVERSION TO FILE +If --pdf or --html argument is provided the script converts news to a file of corresponding format (the number of news converted is affected +by --limit argument if it is provided). File will contain pictures and links, if they existed in the original article and if rss-reader managed +to find them and process while conversion. + +Default destination of output directory, when using rss-reader as script or as CLI utility , is the 'default_dir' directory in current local directory. If there is no such directory in script directory or in root directory of the script, it will be created upon first conversion with invalid path done. +For easier finding, rss-reader will print path to converted files after conversion. + +## Logging +By default, logging is not disabled, but printing logs to console depends on the --verbose argument. If specified, logs will be displayed in the console. +Default logging settings are listed in script directory in module news_brain.py and contain the following: +``` + log_format = "%(asctime)s - %(message)s \n" + log.basicConfig(level=log.DEBUG, format=log_format) + logger = log.getLogger() + return logger + +``` + +## TESTING +Unittests for the script are located in script directory, which is by default the 'rss_reader' directory in the root directory of the script in file tests.py. +Unittests require test files to operate correctly, those files are located in subdirectory 'test_files' in the script directory. +To run unittests for the script user can use following command while being in script directory: +``` +$ python -m unittest tests.py (for UNIX-based systems) +or +python -m unittest tests.py (for Windows) +``` +Test coverage for current version is: +``` +Name Stmts Miss Cover Missing +------------------------------------------------------ +font\__init__.py 0 0 100% +modified_argparser.py 28 8 71% 12-13, 31-32, 34-35, 42, 45 +news_brain.py 285 93 67% 64-65, 73-76, 87-88, 107, 149-150, 153-154, 157-158, 161-162, 169-178, 185-186, 195-196, 210-218, 225-226, 238-241, 296-297, 329, 341, 352-363, 366-398, 437-438, 441-447, 452, 458, 463-464 +template.py 1 0 100% +test_files\__init__.py 0 0 100% +tests.py 110 1 99% 246 +------------------------------------------------------ +TOTAL 424 102 76% + +``` + +### Script has been tested on following feeds: +``` + +https://www.latimes.com/local/rss2.0.xml +https://www.usda.gov/rss/latest-releases.xml +https://www.yahoo.com/news/rss +https://cdn.feedcontrol.net/8/1114-wioSIX3uu8MEj.xml +https://moxie.foxnews.com/feedburner/latest.xml +https://feeds.simplecast.com/54nAGcIl +https://money.onliner.by/feed +https://vse.sale/news/rss +https://news.google.com/rss/ +https://www.nytimes.com/svc/collections/v1/publish/https://www.nytimes.com/section/world/rss.xml +https://www.cnbc.com/id/100727362/device/rss/rss.html +https://www.cbsnews.com/latest/rss/world +https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml +https://auto.onliner.by/feed +http://feeds.bbci.co.uk/news/world/rss.xml +https://www.buzzfeed.com/world.xml +``` \ No newline at end of file diff --git a/DmitriyVkhtiuk_FT/dist/rss-reader-1.5.tar.gz b/DmitriyVkhtiuk_FT/dist/rss-reader-1.5.tar.gz new file mode 100644 index 00000000..8697a1e5 Binary files /dev/null and b/DmitriyVkhtiuk_FT/dist/rss-reader-1.5.tar.gz differ diff --git a/DmitriyVkhtiuk_FT/requirements.txt b/DmitriyVkhtiuk_FT/requirements.txt new file mode 100644 index 00000000..2339127c --- /dev/null +++ b/DmitriyVkhtiuk_FT/requirements.txt @@ -0,0 +1,43 @@ +arabic-reshaper==2.1.3 +asn1crypto==1.5.1 +beautifulsoup4==4.11.1 +certifi==2022.6.15 +cffi==1.15.0 +charset-normalizer==2.0.12 +click==8.1.3 +colorama==0.4.5 +coverage==6.4.1 +cryptography==37.0.2 +cssselect2==0.6.0 +future==0.18.2 +html5lib==1.1 +idna==3.3 +lxml==4.9.0 +numpy==1.23.0 +oscrypto==1.3.0 +pandas==1.4.3 +Pillow==9.1.1 +pycparser==2.21 +Pygments==2.12.0 +pyHanko==0.13.1 +pyhanko-certvalidator==0.19.5 +PyPDF3==1.0.6 +python-bidi==0.4.2 +python-dateutil==2.8.2 +pytz==2022.1 +pytz-deprecation-shim==0.1.0.post0 +PyYAML==6.0 +qrcode==7.3.1 +reportlab==3.6.10 +requests==2.28.0 +six==1.16.0 +soupsieve==2.3.2.post1 +svglib==1.3.0 +tinycss2==1.1.1 +tqdm==4.64.0 +tzdata==2022.1 +tzlocal==4.2 +uritools==4.0.0 +urllib3==1.26.9 +webencodings==0.5.1 +xhtml2pdf==0.2.8 \ No newline at end of file diff --git a/DmitriyVkhtiuk_FT/rss_reader/__init__.py b/DmitriyVkhtiuk_FT/rss_reader/__init__.py new file mode 100644 index 00000000..b012827a --- /dev/null +++ b/DmitriyVkhtiuk_FT/rss_reader/__init__.py @@ -0,0 +1,5 @@ +import sys +import os + +sys.path.append((os.path.dirname(__file__))) + diff --git a/DmitriyVkhtiuk_FT/rss_reader/font/__init__.py b/DmitriyVkhtiuk_FT/rss_reader/font/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/DmitriyVkhtiuk_FT/rss_reader/font/calibri.ttf b/DmitriyVkhtiuk_FT/rss_reader/font/calibri.ttf new file mode 100644 index 00000000..aac47261 Binary files /dev/null and b/DmitriyVkhtiuk_FT/rss_reader/font/calibri.ttf differ diff --git a/DmitriyVkhtiuk_FT/rss_reader/modified_argparser.py b/DmitriyVkhtiuk_FT/rss_reader/modified_argparser.py new file mode 100644 index 00000000..380e1e20 --- /dev/null +++ b/DmitriyVkhtiuk_FT/rss_reader/modified_argparser.py @@ -0,0 +1,45 @@ +import argparse +import sys +from pathlib import Path + + +class ArgParser(argparse.ArgumentParser): + def error(self, message): + """ + Method modifies default errors from argparse.ArgumentParser class. + Prints a usage message incorporating the message to stderr and exits. + """ + self.print_help(sys.stderr) + self.exit(2, 'Something went wrong - > %s\nPlease, check the "help" and try again\n' % message) + + def valid_path(self, path): + """ + Method checks if passed into function path exists and tries to create it if it doesn't. + If path exists or is created successfully, method returns given path, + else - returns default path value for saving files. + :param path: path to check + :return: path for saving files + """ + p = Path(path) + default = Path.cwd() / "default_dir" + if p.exists(): + return p + else: + self.valid_path(p.parent) + try: + if p.suffix == ".pdf": + with open(p, "wb"): + pass + elif p.suffix == ".html": + with open(p, "w"): + pass + else: + if not p.is_dir(): + p.mkdir() + except OSError: + print("Invalid path.. Saving to the project default dir") + if not default.is_dir(): + default.mkdir() + return default + else: + return p diff --git a/DmitriyVkhtiuk_FT/rss_reader/news_brain.py b/DmitriyVkhtiuk_FT/rss_reader/news_brain.py new file mode 100644 index 00000000..660aaff3 --- /dev/null +++ b/DmitriyVkhtiuk_FT/rss_reader/news_brain.py @@ -0,0 +1,464 @@ +""" +Main rss-reader module which does all needed operations with data passed on python script call. +Parses rss-feed, get needed data from here, print it out, convert output to JSON format, create cache, +convert information to html or pdf etc. +""" +import colorama +import requests +import pandas +import bs4 +import html +import json +import logging as log +from datetime import datetime +import io +from template import temp +from colorama import Fore +from pygments import highlight, lexers, formatters +import os +from pathlib import * +from xhtml2pdf import pisa +from xhtml2pdf.default import DEFAULT_FONT +from reportlab.pdfbase import pdfmetrics +from reportlab.pdfbase.ttfonts import TTFont + + +work_dir = Path(__file__).absolute().parent +colorama.init() + + +class NewsBrain: + """ + Base class of rss-reader, gathers required information from rss-feeds and print a dictionary with valuable data. + Caches gathered news for later use, using pandas lib. + Provides methods for printing data in stdout with option of converting to JSON format. + News dictionary and JSON structure are described in README.md + provides converters functionality + """ + def __init__(self, url, limit, js, date, html_path, pdf_path, colorized): + """ + Method serves for processing news from rss feeds. + On object creation it gathers required news data, updates news cache and processes data for future output. + :param url: default = None. URL of rss-feed. + :param limit: default value = None. + If not None, limits the number of news which will be printed in stdout or wil be converted to the specified + format. + :param js: default value = False. if True, outputs data in JSON format. + :param date: default = None. If not None, indicates the date on which the local cache will be searched. + :param html_path: default = None. if not None, the path that indicates where the html file will be saved. + :param pdf_path: default = None. if not None, the path that indicates where the pdf file will be saved. + """ + + self.url = url + self.limit = limit + self.json = js + self.date = date + self.html_path = html_path + self.pdf_path = pdf_path + self.colorized = colorized + + try: + self.cache = pandas.read_csv(work_dir/"cache.csv") + except FileNotFoundError: + self.cache = pandas.DataFrame(columns=["Source", "Feed", "Title", "Date", "Link", "Description", "Image"]) + except pandas.errors.EmptyDataError: + self.cache = pandas.DataFrame(columns=["Source", "Feed", "Title", "Date", "Link", "Description", "Image"]) + + @staticmethod + def create_logger(): + """ + This file sets logging configuration, by default, according to the task, logging level is set to Debug + :return logger + """ + log_format = "%(asctime)s - %(message)s \n" + log.basicConfig(level=log.DEBUG, format=log_format) + logger = log.getLogger() + return logger + + def get_rss_data(self): + """ + The method takes a URL as an argument and returns request response in form of object of bs4.BeautifulSoup. + If URL is invalid, or None, prints clear user message about bad connection status and asks to check internet + connection or RSS URL. + :return: object of bs4.BeautifulSoup, parsed with xml parser, containing rss data + """ + try: + response = requests.get(self.url) + except requests.exceptions.RequestException: + print("Bad request status. Check your internet connection or RSS link and try again.") + else: + page_code = response.text + xml_data = bs4.BeautifulSoup(page_code, "xml") + return xml_data + + @staticmethod + def get_news_text(url): + """ + Method which is called only if there is not section in rss data to try fetch the first and second + paragraphs as a description or firs paragraph of article as the description of new. + :param url: link of html page with article + :return first and second paragraphs is they exist, else try to return first paragraph of article, if there is no +
section on web page, prints 'article not found' and return None. + """ + response = requests.get(url) + root = bs4.BeautifulSoup(response.content, 'html.parser') + article = root.select_one('article') + if article is None: + text = "Something went wrong.. Article not found.. Please, click on the link to read the news" + else: + list_of_paragraphs = article.find_all('p') + try: + text = list_of_paragraphs[0].text + list_of_paragraphs[1].text + except IndexError: + text = list_of_paragraphs[0].text + return text + + def get_news(self, xml_data, limit=None): + """ + A method that finds the required information in the bs4.BeautifulSoup object and writes it to the corresponding + dictionary keys. + if the object does not have a corresponding description section, the method tries to get the first two + paragraphs of the article. If the self.json parameter is specified(True), outputs information in JSON format. + If the method cannot fetch the information from a particular tag, the value of the dictionary key = not found. + Also updates the local news cache and converts the news to html or pdf if the paths for these parameters have + been specified. + :param xml_data: object of bs4.BeautifulSoup which contains all needed information about news. + :param limit: limits the number of news which will be printed in stdout or wil be + converted to the specified format. + :return: if limit <= 0 prints an error 'News limit must be positive number' and return None. + If RSS URL does not contain standard information of xml file like , prints + "News Not Found. Please, check your RSS URL." and return None. + """ + data = {} + list_of_news = [] + + news = xml_data.find_all("item") + if len(news) == 0: + print("News Not Found. Please, check your RSS URL.") + return + else: + if limit is None or limit > len(news): + limit = len(news) + elif limit <= 0: + print("News limit must be positive number") + return + for i in range(limit): + data["Source"] = self.url + try: + data["Feed"] = xml_data.channel.title.text + except AttributeError: + data["Feed"] = "Feed not found.." + try: + data["Title"] = html.unescape(news[i].title.text.replace("\xa0", " ")) + except AttributeError: + data["Title"] = "Title not found.." + try: + data["Date"] = news[i].pubDate.text + except AttributeError: + data["Date"] = "Date not found" + try: + data["Link"] = news[i].link.text + except AttributeError: + data["Link"] = "Link not found.." + try: + desc = news[i].find("description").text + soup = bs4.BeautifulSoup(desc, "html.parser") + data["Description"] = html.unescape(soup.p.text.replace("\n", " ").replace("\xa0", " ")) + if data["Description"] == "": + data["Description"] = xml_data.channel.description.text + except AttributeError: + try: + desc = news[i].find("description").text + if '<' not in desc: + data["Description"] = html.unescape( + news[i].find("description").text.replace("\n", " ").replace("\xa0", " ")) + else: + data["Description"] = xml_data.channel.description.text + except AttributeError: + data["Description"] = html.unescape( + self.get_news_text(data["Link"]).replace("\n", " ").replace("\xa0", " ")) + try: + data["Image"] = news[i].find("media:content").get("url") + except AttributeError: + try: + data["Image"] = xml_data.image.url.text + except AttributeError: + data["Image"] = "Image not found" + finally: + cached_d = {key: value for key, value in data.items()} + list_of_news.append(cached_d) + log.info(f"Printing {i + 1} new") + print(Fore.RESET) + if not self.json: + self.print_data(data) + else: + js_data = {i + 1: data} + self.print_data(js_data) + cached_news = pandas.DataFrame(list_of_news) + cache_merge = pandas.merge(self.cache, cached_news, how="outer").drop_duplicates(subset="Title") + cache_merge.to_csv(work_dir/"cache.csv", index=False) + self.convert(cached_news) + + def print_data(self, data): + """ + a method that prints a dictionary with news tags. If the json parameter is specified, + converts the dictionary to json string and prints it to stdout. + :param data: The dictionary which will be printed in stdout + :return: None + """ + if self.colorized: + if not self.json: + for key, value in data.items(): + print(f"{Fore.BLUE}{key}: {Fore.YELLOW}{value}") + print("\n\n") + else: + json_format = json.dumps(data, ensure_ascii=False, indent=4) + colored_json = highlight(json_format, lexers.JsonLexer(), formatters.TerminalFormatter()) + print(f"{colored_json}\n\n") + print(Fore.LIGHTCYAN_EX) + else: + if not self.json: + for key, value in data.items(): + print(f"{key}: {value}") + print("\n\n") + else: + json_format = json.dumps(data, ensure_ascii=False, indent=4) + print(f"{json_format}\n\n") + + @staticmethod + def reformat_the_dates(): + """ + Method formats the dates of the data frame without changing it to the format "% Y% m% d". + If there is no template for date formatting, this date remains unchanged. + + :return: pandas Dataframe with dates, sorted by values in order from last to older. + """ + try: + df = pandas.read_csv(work_dir/"cache.csv") + except FileNotFoundError: + return + except pandas.errors.EmptyDataError: + return + else: + log.info("Prepare dates for search in df..") + list_of_dates = df.Date + list_of_new_dates = [] + + for elem in list_of_dates: + try: + str_to_date = datetime.strptime(elem, "%a, %d %b %Y %H:%M:%S %z") + d_2 = datetime.strftime(str_to_date, "%Y%m%d") + list_of_new_dates.append(d_2) + except ValueError: + try: + str_to_date = datetime.strptime(elem, "%Y-%m-%dT%H:%M:%SZ") + d_2 = datetime.strftime(str_to_date, "%Y%m%d") + list_of_new_dates.append(d_2) + except ValueError: + try: + str_to_date = datetime.strptime(elem, "%a, %d %b %Y %H:%M:%S %Z") + d_2 = datetime.strftime(str_to_date, "%Y%m%d") + list_of_new_dates.append(d_2) + except ValueError: + try: + str_to_date = datetime.strptime(elem, "%Y-%m-%dT%H:%M:%S%z") + d_2 = datetime.strftime(str_to_date, "%Y%m%d") + list_of_new_dates.append(d_2) + except ValueError: + print(f"Can't reformat the date {elem}") + list_of_new_dates.append(elem) + + df.Date = list_of_new_dates + return df.sort_values(by="Date", ascending=False).reset_index(drop=True) + + def df_to_dict(self, df): + """ + the method that takes the dataframe as input converts its data into a dictionary and outputs it to the stdout, + but the dataframe itself returns unchanged. + :param df: pandas DataFrame, which will be converted into a dict, and printed out. + :return: pandas DataFrame without changing. + """ + log.info("prepare dict from dataframe..") + cache_to_output = {} + for index, row in df.iterrows(): + cache_to_output["Source"] = row.Source + cache_to_output["Feed"] = row.Feed + cache_to_output["Title"] = row.Title + cache_to_output["Date"] = row.Date + cache_to_output["Link"] = row.Link + cache_to_output["Description"] = row.Description + cache_to_output["Image"] = row.Image + log.info(f"Printing {index+1} new from cache") + print(Fore.RESET) + if not self.json: + self.print_data(cache_to_output) + else: + js_data = {index + 1: cache_to_output} + self.print_data(js_data) + return df + + def convert(self, cached_data): + """ + Method modifies the dataframe with adding img and a href tags according to the html image and link tags, + and converts the dataframe to html using a previously created template for the html format. + If the html parameter is specified, saves the html file in the appropriate path, if the path for html is not + specified, but specified for PDF, converts the already prepared string with html code to PDF. + If it is not possible to save the file in the specified path, it creates a default folder and saves the + files there. If a file name is specified, and specified path is valid to create file, + saves it under this name, if not - saves it under default name news.pdf or news.html. + Prints to stdout, where file is saved after saving. + Disables warnings and from xhtml2pdf lib logger for case if internet connection is not provided. It doesn't + print xhtml2pdf lib logs to console. + Change default font of this lib to display cyrillic correctly. + :param cached_data: pandas dataframe that converts to the specified format. + :return: None + """ + default = Path.cwd() / "default_dir" + df = cached_data + + def create_images(img): + """ + Method, which creates a template with local paths of images to convert the dataframe to html without + internet connection and to display images correctly + :param img: local path to image + :return: template for tag 'image', which will be used to convert needed dataframe to html + """ + if img != "Image not found": + img_template = '''image'''.format(img=img) + else: + img_template = '''

{img}

'''.format(img=img) + return img_template + + def create_url(url): + """ + Method, which creates a template for links to convert the dataframe to html to display links correctly. + :param url: tag Link for each row for dataframe + :return: template for tag 'link', which will be used to convert needed dataframe to html + """ + if url != "Link not found..": + url_template = '''Click me!!'''.format(url=url) + else: + url_template = '''

{url}

'''.format(url=url) + return url_template + + for index, row in df.iterrows(): + row.Image = create_images(row.Image) + row.Source = create_url(row.Source) + row.Link = create_url(row.Link) + + t = df.to_html(render_links=True, escape=False, index=False, justify="center") + html_out = temp.format(outp=t) + if self.html_path is not None: + if self.html_path.suffix != ".html": + self.html_path = self.html_path / 'news.html' + try: + with open(self.html_path, 'w', encoding="utf-8") as f: + f.write(html_out) + print(f"was saved there: {os.path.abspath(self.html_path)}") + except OSError: + if not default.is_dir(): + default.mkdir() + with open(default / "news.html", 'w', encoding="utf-8") as f: + f.write(html_out) + print(f"Invalid path was given..Was saved there: {os.path.abspath(default / 'news.html')}") + + if self.pdf_path is not None: + source_html = html_out + if self.pdf_path.suffix != ".pdf": + self.pdf_path = self.pdf_path / 'news.pdf' + + output_filename = self.pdf_path + + def convert_html_to_pdf(html_string, path_to_save): + """ + Method converts html string to pdf format. + + :param html_string: string of html code page + :param path_to_save: path to save the pdf file + :return: None + """ + result_file = None + font_path = work_dir / 'font' / 'calibri.ttf' + pdfmetrics.registerFont(TTFont('ru-readable', font_path)) + DEFAULT_FONT["helvetica"] = "ru-readable" + + try: + result_file = open(path_to_save, "w+b") + print(f"Was saved there: {os.path.abspath(path_to_save)}") + except OSError: + if not default.is_dir(): + default.mkdir() + result_file = open(default / "news.pdf", "w+b") + print(f"Invalid path was given..Was saved there: {os.path.abspath(default / 'news.pdf')}") + finally: + pisa.CreatePDF(io.StringIO(html_string), dest=result_file, encoding="utf-8") + result_file.close() + return + log.disable(log.ERROR) + convert_html_to_pdf(source_html, output_filename) + + def print_from_cache(self, limit=None): + """ + Method that selects the required news by date, limit, and source and outputs and converts the data to the + desired format if arguments are specified. + Formats the dates of the dataframe to "% Y% m% d" to search for the specified date. + if there is no news on a given date, prints "Error: No news found", else if cache.csv could not be found or is + empty, prints "Error: News cache not found..", else if limit <= 0, prints "News limit must be a positive + number", else if source is specified and there is no news for specified date with this source, prints + "Error: No news found on this date with given source." + Also, this method combines several data frames into one, so that if the limit is specified, + to display the desired news, sorted by date from all sources. + + :param limit: limits the number of news which will be printed in stdout or wil be + converted to the specified format. + :return: None + """ + cache_data = self.reformat_the_dates() + + try: + news_on_date = cache_data[(cache_data.Date == self.date)].reset_index(drop=True) + if len(news_on_date) == 0: + print("Error: No news found on this date") + return + else: + news_not_on_date = cache_data[(cache_data.Date != self.date)].reset_index(drop=True) + news_on_date_with_source = news_on_date[(news_on_date.Source == self.url)].reset_index(drop=True) + news_not_on_date_with_source = news_not_on_date[(news_not_on_date.Source == self.url)].reset_index( + drop=True) + news_on_date_without_source = cache_data[(cache_data.Date == self.date) & ( + cache_data.Source != self.url)].reset_index(drop=True) + another_news = cache_data[(cache_data.Date != self.date) & (cache_data.Source != self.url)].reset_index( + drop=True) + limit_date = pandas.concat([news_on_date, news_not_on_date]).reset_index(drop=True) + limit_date_source = pandas.concat( + [news_on_date_with_source, news_on_date_without_source, + news_not_on_date_with_source, another_news]).reset_index(drop=True) + + except AttributeError: + print("Error: News cache not found..") + else: + if limit is None: + if self.url is None: + cache = self.df_to_dict(news_on_date) + else: + if len(news_on_date_with_source) == 0: + print("Error: No news found on this date with given source.") + return + cache = self.df_to_dict(news_on_date_with_source) + + else: + + if limit > len(cache_data): + limit = len(cache_data) + elif limit <= 0: + print("News limit must be a positive number..") + return + + if self.url is None: + cache = self.df_to_dict(limit_date.head(limit)) + else: + if len(news_on_date_with_source) == 0: + print("Error: No news found on this date with given source.") + return + cache = self.df_to_dict(limit_date_source.head(limit)) + self.convert(cache) diff --git a/DmitriyVkhtiuk_FT/rss_reader/rss_reader.py b/DmitriyVkhtiuk_FT/rss_reader/rss_reader.py new file mode 100644 index 00000000..2f29970e --- /dev/null +++ b/DmitriyVkhtiuk_FT/rss_reader/rss_reader.py @@ -0,0 +1,47 @@ +from modified_argparser import ArgParser +from news_brain import NewsBrain +from colorama import Fore +parser = ArgParser(description="Pure Python command-line RSS reader.", add_help=True) +parser.add_argument("source", type=str, help="RSS URL", nargs="?", default=None) +parser.add_argument("--limit", + help="Limit news topics if this parameter provided. (MUST expect one argument)", + default=None, type=int) +parser.add_argument("--json", help="print result as JSON in stdout. ", action="store_true") +parser.add_argument("--verbose", help="Outputs verbose status messages", action='store_true') +parser.add_argument("--version", help="Print version info", action='version', version=f'Version {1.5}') +parser.add_argument("--date", + help="Fetch news from local cache by date", default=None, type=str, ) +parser.add_argument("--html", help="Convert news to .html", default=None, type=parser.valid_path, dest="html_path") +parser.add_argument("--pdf", help="Convert news to .pdf", default=None, type=parser.valid_path, dest="pdf_path") +parser.add_argument("--colorized", help="Outputs in colorize mode", action="store_true") + + +def main(): + """ + The function combines the rss-reader with argparse module functionality. It uses NewsBrain class for getting news + from rss feeds and ArgParse class for parsing command line arguments to provide Command Line Interface to user + and returns the arguments passed on python script call + :return:None + """ + args = parser.parse_args() + news = NewsBrain(args.source, args.limit, args.json, args.date, args.html_path, args.pdf_path, args.colorized) + if args.colorized: + print(Fore.LIGHTCYAN_EX) + lim = args.limit + if args.verbose: + news.create_logger() + if args.date is None: + xml = news.get_rss_data() + + if xml is not None: + if lim is None or lim > len(xml.find_all("item")): + news.get_news(xml) + else: + news.get_news(xml, lim) + + else: + news.print_from_cache(lim) + + +if __name__ == "__main__": + main() diff --git a/DmitriyVkhtiuk_FT/rss_reader/template.py b/DmitriyVkhtiuk_FT/rss_reader/template.py new file mode 100644 index 00000000..3f94e433 --- /dev/null +++ b/DmitriyVkhtiuk_FT/rss_reader/template.py @@ -0,0 +1,12 @@ +temp = ''' + + + + + Converted version + + + {outp} + + +''' \ No newline at end of file diff --git a/DmitriyVkhtiuk_FT/rss_reader/test_files/__init__.py b/DmitriyVkhtiuk_FT/rss_reader/test_files/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/DmitriyVkhtiuk_FT/rss_reader/test_files/test_get_news.txt b/DmitriyVkhtiuk_FT/rss_reader/test_files/test_get_news.txt new file mode 100644 index 00000000..d2cca44b --- /dev/null +++ b/DmitriyVkhtiuk_FT/rss_reader/test_files/test_get_news.txt @@ -0,0 +1,30 @@ + + +BuzzFeed News +https://www.buzzfeednews.com + +en +Copyright 2022 BuzzFeed, Inc. +BuzzFeed, Reporting To You +Tue, 28 Jun 2022 03:11:36 +0000 +editor@buzzfeed.com (https://www.buzzfeednews.com/article/buzzfeednews/about-buzzfeed-news) +editor@buzzfeed.com (https://www.buzzfeednews.com/article/buzzfeednews/about-buzzfeed-news) + +https://webappstatic.buzzfeed.com/static/images/public/rss/logo-news.png +BuzzFeed News +https://www.buzzfeednews.com + + +She Was One Year Away From Going To College. Then The Taliban Banned Her From School. + +The policy prohibiting girls from attending school after sixth grade contradicts the regime’s previous promises to loosen restrictions on education rights.


View Entire Post ›

]]> +
+https://www.buzzfeednews.com/article/syedzabiullah/afghanistan-taliban-girls-school-ban +Mon, 13 Jun 2022 20:32:05 -0400 +https://www.buzzfeednews.com/article/syedzabiullah/afghanistan-taliban-girls-school-ban +Inequality + +Syed Zabiullah Langari +
+
+
\ No newline at end of file diff --git a/DmitriyVkhtiuk_FT/rss_reader/tests.py b/DmitriyVkhtiuk_FT/rss_reader/tests.py new file mode 100644 index 00000000..ded0e00b --- /dev/null +++ b/DmitriyVkhtiuk_FT/rss_reader/tests.py @@ -0,0 +1,247 @@ +""" +File contains tests for rss_reader.py script +""" +import unittest +from unittest import TestCase +from unittest.mock import patch, call +from modified_argparser import * +from news_brain import * + + +class TestRssReader(TestCase): + @patch("builtins.print") + def test_not_valid_path(self, mock_print): + """ + tests not valid dir path and check, will the default dir path return + :param mock_print: mock builtins.print + :return: message, that path is incorrect and files will be saved to the default dir + """ + p = ArgParser() + path = Path.home().parent / "xdd" + result = p.valid_path(path) + expected = Path.cwd() / "default_dir" + self.assertEqual(result, expected) + self.assertEqual(mock_print.mock_calls, [call("Invalid path.. Saving to the project default dir")]) + + def test_valid_path(self): + """ + tests valid path and check, will the valid path to directory return or not. + :return: valid dir path to save converted news + """ + p = ArgParser() + path = Path.home() + result = p.valid_path(path) + expected = Path(path) + self.assertEqual(result, expected) + + @patch("requests.get") + def test_valid_get_rss_data(self, mock_get): + """ + test mocks requests.get and check the method get_rss_data with created template + :param mock_get: part of xml template to check the work of method get_rss_data + :return: None + """ + news = NewsBrain("https://news.yahoo.com/rss/", None, False, None, None, None, None) + mock_get.return_value.text = '' + expected_result = bs4.BeautifulSoup( + '' + '', + "xml") + self.assertTrue(news.get_rss_data(), expected_result) + + @patch("requests.get") + def test_get_news_text(self, mock_get): + """ + Tests work of method get_news_text, if description tag is not in bs4.BeautifulSoup object. + Mocks requests.get with template of html page with article tag and check the work of method. + :return: None + """ + news = NewsBrain("https://news.yahoo.com/rss/", None, False, None, None, None, None) + mock_get.return_value.content = "

abcd

" + self.assertEqual(news.get_news_text("https://news.yahoo.com/senator-2010-deposition-13-olds-152104735.html"), + "abcd") + + @patch("pandas.read_csv") + def test_reformat_the_dates(self, mock_read): + """ + Test mocks pandas.read_csv with template of DataFrame to check the work of method reformat_the_dates without + reading "cache.csv". + :return: None + """ + news = NewsBrain(None, None, False, None, None, None, None) + test_date_dict = { + "Date": ["Tue, 25 May 2021 23:45:43 GMT", "2022-03-30T22:21:05Z", "Fri, 13 Jun 2022 17:15:00 -0400", + "2022-06-04T01:00:00+04:00"] + } + mock_read.return_value = pandas.DataFrame(test_date_dict) + expected_dict = { + "Date": ["20220613", "20220604", "20220330", "20210525"] + } + expected = pandas.DataFrame(expected_dict) + self.assertEqual(True, news.reformat_the_dates().reset_index(drop=True).equals(expected.reset_index(drop=True))) + + @patch("builtins.print") + @patch("pandas.read_csv") + def test_invalid_reformat_the_dates(self, mock_read, mock_print): + """ + Tests the case of pubDate, if the template to reformat the date does not exist. + Mocks pandas.read_csv to check the work without file "cache.csv", + mocks builtins.print to check, will be right message outputted or not. + + :return: None + """ + news = NewsBrain(None, None, False, None, None, None, None) + test_date_dict = { + "Date": ["Tue, 25 May 2021 23:45:43 GMT", "2022-03-30T22:21:05Z", "Fri, 13 Jun 2022 17:15:00 -0400", + "2022-06-04T01:00:00+04:00+35"] + } + mock_read.return_value = pandas.DataFrame(test_date_dict) + expected_dict = { + "Date": ["20220613", "20220330", "2022-06-04T01:00:00+04:00+35", "20210525"] + } + expected = pandas.DataFrame(expected_dict) + news.reformat_the_dates().reset_index(drop=True) + self.assertEqual(mock_print.mock_calls, [call("Can't reformat the date 2022-06-04T01:00:00+04:00+35")]) + self.assertEqual(True, news.reformat_the_dates().reset_index(drop=True).equals(expected)) + + @patch("builtins.print") + @patch("bs4.BeautifulSoup.find_all") + def test_invalid_data_get_news(self, mock_val, mock_print): + """ + Test mocks method find_all from bs4.BeautifulSoup class with template of invalid xml page. + test mock builtins.print to check the message(error), which will be printed in stdout. + :return: None + """ + news = NewsBrain("https://news.google.com/rss/", None, False, None, None, None, None) + mock_val.return_value = [] + xml_data = bs4.BeautifulSoup( + '', "xml") + news.get_news(xml_data) + self.assertEqual(mock_print.mock_calls, [call("News Not Found. Please, check your RSS URL.")]) + + @patch("builtins.print") + def test_df_to_dict(self, mock_print): + """ + Tests that there are no changes in the initial data frame after converting df to dict and printing it out. + Mocks printing dict to console. + :return: None + """ + news = NewsBrain(None, None, False, None, None, None, None) + d = { + "Source": ["b", "c", "d"], + "Feed": ["e", "f", "g"], + "Title": ["1", "2", "3"], + "Date": ["2", "4", "5"], + "Link": ["12", "32", "21"], + "Description": ["A", "B", "C"], + "Image": ["jpg", "png", "jpeg"] + } + mock_print.return_value = None + df = pandas.DataFrame(d) + self.assertEqual(True, news.df_to_dict(df).equals(df)) + + @patch("builtins.print") + def test_invalid_limit_get_news(self, mock_print): + """ + Tests method behavior with an invalid limit, but with valid data. template for check is taken from test_files + dir. Mocks builtins print to check message, what will be printed in this case. + :return: None + """ + news = NewsBrain("https://www.buzzfeed.com/world.xml", 0, False, None, None, None, None) + with open(work_dir / "test_files" / "test_get_news.txt") as f: + xml_data = bs4.BeautifulSoup(f.read(), "xml") + news.get_news(xml_data, 0) + self.assertEqual(mock_print.mock_calls, [call("News limit must be positive number")]) + + @patch("builtins.print") + @patch("pandas.read_csv") + def test_print_from_cache_invalid_limit(self, mock_reformat_the_dates, mock_print): + """ + Tests method behavior with an invalid limit, but with valid date. Test mocks pandas.read_csv to use + template for check the behaviour of method. Mocks builtins.print to check message, what will be printed + in this case. + + :return: None + """ + test_dict = { + "Date": ["Tue, 25 May 2021 23:45:43 GMT", "2022-03-30T22:21:05Z", "Fri, 13 Jun 2022 17:15:00 -0400", + "2022-06-04T01:00:00+04:00"], + "Source": ["https://rss.art19.com/apology-line", "https://rss.art19.com/apology-line", + "https://rss.art19.com/apology-line", "https://rss.art19.com/apology-line"] + } + expected = pandas.DataFrame(test_dict) + mock_reformat_the_dates.return_value = expected + news = NewsBrain(None, 0, False, "20220613", None, None, None) + news.print_from_cache(0) + self.assertEqual(mock_print.mock_calls, [call("News limit must be a positive number..")]) + + @patch("builtins.print") + @patch("pandas.read_csv") + def test_print_from_cache_no_news_found(self, mock_reformat_the_dates, mock_print): + """ + Tests method behavior with a valid limit, but with invalid date. Test mocks pandas.read_csv to use + template for check the behaviour of method. Mocks builtins.print to check message, what will be printed + in this case. + :return: None + """ + test_dict = { + "Date": ["Tue, 25 May 2021 23:45:43 GMT", "2022-03-30T22:21:05Z", "Fri, 13 Jun 2022 17:15:00 -0400", + "2022-06-04T01:00:00+04:00"], + "Source": ["https://rss.art19.com/apology-line", "https://rss.art19.com/apology-line", + "https://rss.art19.com/apology-line", "https://rss.art19.com/apology-line"] + } + expected = pandas.DataFrame(test_dict) + mock_reformat_the_dates.return_value = expected + news = NewsBrain(None, 5, False, "20220603", None, None, None) + news.print_from_cache(5) + self.assertEqual(mock_print.mock_calls, [call("Error: No news found on this date")]) + + @patch("builtins.print") + @patch("pandas.read_csv") + def test_print_from_cache_with_url_specified(self, mock_reformat_the_dates, mock_print): + """ + Tests method behavior with a valid limit, with valid date but with invalid source. Test mocks pandas.read_csv + to use template for check the behaviour of method. Mocks builtins.print to check message, what will be printed + in this case. + :return: None + """ + test_dict = { + "Date": ["Tue, 25 May 2021 23:45:43 GMT", "2022-03-30T22:21:05Z", "Fri, 13 Jun 2022 17:15:00 -0400", + "2022-06-04T01:00:00+04:00"], + "Source": ["https://rss.art19.com/apology-line", "https://rss.art19.com/apology-line", + "https://rss.art19.com/apology-line", "https://rss.art19.com/apology-line"] + } + expected = pandas.DataFrame(test_dict) + mock_reformat_the_dates.return_value = expected + news = NewsBrain("https://vse.sale/news/rss", 2, False, "20220613", None, None, None) + news.print_from_cache(2) + self.assertEqual(mock_print.mock_calls, [call("Error: No news found on this date with given source.")]) + + @patch("pandas.DataFrame.to_csv") + @patch("builtins.print") + def test_valid_get_news(self, mock_print, mock_pandas): + """ + Test checks behaviour of get_news() method with valid limit and with valid template 'test_get_news.txt', + which is the part of real xml page. Mocks builtins print to check new, what will be printed in this case. + :return: None + """ + news = NewsBrain("https://www.buzzfeed.com/world.xml", 1, False, None, None, None, None) + with open(work_dir / "test_files" / "test_get_news.txt", encoding="utf-8") as f: + xml_data = bs4.BeautifulSoup(f.read(), "xml") + news.get_news(xml_data) + mock_pandas.return_value = None + self.assertEqual(mock_print.mock_calls, [ + call('\x1b[39m'), + call('Source: https://www.buzzfeed.com/world.xml'), + call('Feed: BuzzFeed News'), + call('Title: She Was One Year Away From Going To College. Then The Taliban Banned Her From School.'), + call('Date: Mon, 13 Jun 2022 20:32:05 -0400'), + call('Link: https://www.buzzfeednews.com/article/syedzabiullah/afghanistan-taliban-girls-school-ban'), + call('Description: BuzzFeed, Reporting To You'), + call('Image: https://webappstatic.buzzfeed.com/static/images/public/rss/logo-news.png'), + call('\n\n')]) + + +if __name__ == '__main__': + unittest.main() diff --git a/DmitriyVkhtiuk_FT/setup.py b/DmitriyVkhtiuk_FT/setup.py new file mode 100644 index 00000000..59b05f65 --- /dev/null +++ b/DmitriyVkhtiuk_FT/setup.py @@ -0,0 +1,27 @@ +from setuptools import setup, find_packages + +package_data = \ + {'': ['*']} + +with open("requirements.txt", "r", encoding="utf-8") as reqs: + requirements = reqs.read() +with open("README.md", "r", encoding="utf-8") as readme: + description = readme.read() +entry_points = \ + {'console_scripts': ['rss_reader = rss_reader.rss_reader:main']} + +setup_kwargs = { + 'name': 'rss-reader', + 'version': 1.5, + 'description': 'A simple CLI rss reader', + 'author': 'DVikhtiuk', + 'author_email': 'dimastol1ca@gmail.com', + 'long_description': description, + 'packages': find_packages(), + 'package_data': package_data, + 'install_requires': requirements, + 'entry_points': entry_points, + 'python_requires': '>=3.7,<4.0', +} + +setup(**setup_kwargs)