diff --git a/doc/widgets/YouGet.rst b/doc/widgets/YouGet.rst new file mode 100644 index 00000000..e2e818d1 --- /dev/null +++ b/doc/widgets/YouGet.rst @@ -0,0 +1,130 @@ + +.. meta:: + :description: Orange3 Textable Prototypes documentation, YouGet widget + :keywords: Orange3, Textable, Prototypes, documentation, YouGet, widget + +.. _YouGet: + +YouGet +======= + +.. image:: figures/YouGet.svg + +Import YouTube video comments (``_). + +Author +------ + +Virgile Albasini, Sophie Ward, Lorelei Chevroulet, and Vincent Joris. + +Signals +------- + +Inputs: + +* None + +Outputs: + +* The comments from a YouTube video in the form of a segmentation + + +Description +----------- + +This widget provides a graphical interface which is designed to allow the user to enter a YouTube video's URL and +download its comments (``_). +The output is a segmentation containing the **Number of Comments** desired from the chosen YouTube video. + +* Choose one or more URL(s) and place them in the **URL(s)** section in the following format: URL1, URL2, URL3, etc. +* **Add** them to the **Sources** section +* Select the **Number of Comments** you would like to download from the YouTube video +* Select whether you would like them sorted by **Date** or **Popularity** +* Press **Send** to see the comments as well as some additional information in the form of a segmentation + +Interface +~~~~~~~~~~~~~~~ + +User controls are divided into two main sections: **Sources** and **More Options**. + +**Sources** contains the **URL**, and the **Add**, **Clear All** and **Remove** button. + +**More Options** contains the **Select number of comments** and the **Sort by**. +In the **Select number of comments**, the user can select the number of comments +they wish to see displayed, and in the **Sort by**, the user can choose to either sort the comments by **Date** or by **Popularity**. + +.. _YouGet_principal: + +.. figure:: figures/YouGet_principal.png + :align: center + :alt: Interface of the YouGet widget + :height: 600px + + Figure 1: **YouGet** widget interface. + +Sources +******* + +The **Sources** section contains all the controls related to the way **YouGet** processes the input data. + +The user chooses one ore more YouTube videos which they would like to extract its comments. To confirm their URL(s), they must press on the **Add** button which +will then add their URL(s) to a list in the section above called **Sources**. The user can add one or more URLs to their list. +If they wish to remove a URL, they can select the URL they wish to delelte and press on the **Remove** button. If they wish to not only remove one, +but all of their URLs, they can press on the **Clear All** button. + +More Options +******* + +The **More Options** section contains the controls to select the **number of comments** desired in output and how you would like to sort the comments, by **Date** or by **Popularity**. The user can choose between having **1 comment (minimum requirement), 5, 10, 100, 1000, 10'000 or no limit** +of comments in output. When sorting by **Date**, the oldest comment will appear first in the list. When sorting by **Popularity**, the most liked comment will appear first. Once the user presses on the **Send** button, the comments will then be displayed in output in the form +of a segmentation. + +.. figure:: figures/YouGet_5comments.png + :align: center + :alt: Interface of the YouGet widget with 5 comments + :height: 600px + + Figure 2: **YouGet** widget output with **5 comments** selected and sorted by **Date**. +.. figure:: figures/YouGet_10comments.png + :align: center + :alt: Interface of the YouGet widget with 10 comments + :height: 600px + + Figure 3: **YouGet** widget output with **10 comments** selected and sorted by **Popularity**. + +Messages +-------- + +Information +~~~~~~~~~~~ + +*f"{len(processed_data)} segment@p sent to output"* + This confirms that the widget has operated correctly and that the segments have been sent to output. + +Warnings +~~~~~~~~ + +*Settings were changed, please click 'Send' when ready.* + Settings have changed but the **Send automatically** checkbox has not been selected, + so the user is prompted to click the **Send** button (or equivalently check the box) + in order for computation and data emission to proceed. + +*Step 1/2: Processing...* + The requested analysis is being performed. + +Errors +~~~~~~~~ + +*(nb) duplicate URL(s) found and deleted* + The system finds one or multiple duplicate URL(s) and deletes them instead of adding them. + +*(nb) URL(s) are not valid YouTube videos* + The widget detects that the URL(s) are misspelt and does not add them to the list. + +*One or more elements are not YouTube URLs or please check your internet connection* + The widget detetcs that there is an error with the process. Either in terms of the URL(s) themselves or with the internet connection. If there are multiple elements added in the **URL(s)** section and one of them is not an URL, then all of these elements will not be added to the **Sources** section. If the internet connection is interrupted during the process of adding URLs to the **Sources** section or during the loading process of the comments, there will be an error message that will appear to please check your internet connection. + +Note +~~~~~~~~ + +Note that when starting the widget and when first adding your URL(s), the **Add** button is gray, however, it is functional. Press the **Add** button to add your URL(s) to the **Sources** section. \ No newline at end of file diff --git a/doc/widgets/figures/YouGet.svg b/doc/widgets/figures/YouGet.svg new file mode 100644 index 00000000..0a925e1c --- /dev/null +++ b/doc/widgets/figures/YouGet.svg @@ -0,0 +1,4 @@ + + + + \ No newline at end of file diff --git a/doc/widgets/figures/YouGet_10comments.png b/doc/widgets/figures/YouGet_10comments.png new file mode 100644 index 00000000..4360896b Binary files /dev/null and b/doc/widgets/figures/YouGet_10comments.png differ diff --git a/doc/widgets/figures/YouGet_5comments.png b/doc/widgets/figures/YouGet_5comments.png new file mode 100644 index 00000000..443c9f1e Binary files /dev/null and b/doc/widgets/figures/YouGet_5comments.png differ diff --git a/doc/widgets/figures/YouGet_principal.png b/doc/widgets/figures/YouGet_principal.png new file mode 100644 index 00000000..6c2e37b4 Binary files /dev/null and b/doc/widgets/figures/YouGet_principal.png differ diff --git a/orangecontrib/textable_prototypes/widgets/YouGet.py b/orangecontrib/textable_prototypes/widgets/YouGet.py new file mode 100644 index 00000000..e23d99c1 --- /dev/null +++ b/orangecontrib/textable_prototypes/widgets/YouGet.py @@ -0,0 +1,785 @@ +""" Imports """ +from functools import partial +import time +import json +import re +import dateparser +from datetime import datetime + +from _textable.widgets.TextableUtils import ( + OWTextableBaseWidget, VersionedSettingsHandler, ProgressBar, + InfoBox, SendButton, pluralize, Task +) + +from LTTL.Segmentation import Segmentation +from LTTL.Input import Input + + +# Using the threaded version of LTTL.Segmenter to create +# a "responsive" widget. +import LTTL.SegmenterThread as Segmenter + +from Orange.widgets import widget, gui, settings +from Orange.widgets.utils.widgetpreview import WidgetPreview + +from youtube_comment_downloader import * +# pour tester l'url +import requests + +from PyQt5.QtWidgets import QMessageBox +from Orange.widgets.settings import Setting + +""" +Class DemoTextableWidget +Copyright 2025 University of Lausanne +----------------------------------------------------------------------------- +This file is part of the Orange3-Textable-Prototypes package. + +Orange3-Textable-Prototypes is free software: you can redistribute +it and/or modify it under the terms of the GNU General Public License +as published by the Free Software Foundation, either version 3 of the +License, or (at your option) any later version. + +Orange3-Textable-Prototypes is distributed in the hope that it will +be useful, but WITHOUT ANY WARRANTY; without even the implied warranty +of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with Orange3-Textable-Prototypes. If not, see + . +""" + +""" +Sources that helped us code our widget "YouGet": + - ChatGPT (including GPT-3.5 and limited GPT-4o mini): Used ChatGPT to help with + regex to only accept YouTube URLs ("https://chatgpt.com/ + share/6800c404-cb74-8000-afef-e321b9517c47") + - Draw.io: Used Draw.io for the YouGet logo (https://app.diagrams.net/) + - Widget SciHub: for sections of code where both widgets have in common + (https://github.com/sarahperettipoix/orange3 + - url settings: (https://www.youtube.com/watch?v=ScMzIvxBSi4) + -textable-prototypes/blob/master/orangecontrib/textable_prototypes/widgets/SciHubatorTest.py) +""" + +__version__ = '0.0.1' +__author__ = "Virgile Albasini, Sophie Ward, Lorelei Chevroulet, Vincent Joris " +__maintainer__ = "Aris Xanthos" +__email__ = "aris.xanthos@unil.ch" + +class YouGet(OWTextableBaseWidget): + """Demo Orange3-Textable widget""" + + name = "YouGet" + description = "Widget that downloads comments from a youtube URL" + icon = "icons/YouGet.svg" + priority = 99 + + # Input and output channels (remove if not needed)... + inputs = [] + outputs = [("New segmentation", Segmentation)] + + # Copied verbatim in every Textable widget to facilitate + # settings management. + settingsHandler = VersionedSettingsHandler( + version=__version__.rsplit(".", 1)[0] + ) + + # Settings... + # url = settings.Setting("https://www.youtube.com/watch?v=ScMzIvxBSi4") + url = settings.Setting("") + + # widget will fetch n=0 comments -> default is all + # n_desired_comments = 0 + # n_desired_comments = 1 # for testing + + + want_main_area = False + + #---------- START: The following section of code has been borrowed from SciHub.py ---------- + # (https://github.com/sarahperettipoix/orange3-textable-prototypes/ + # blob/master/orangecontrib/textable_prototypes/widgets/SciHubatorTest.py) + DOIs = Setting([]) + URLLabel = Setting([]) + selectedURLLabel = Setting([]) + new_url = Setting("") + autoSend = settings.Setting(False) + DOI = Setting(u'') + n_desired_comments = Setting("") + sortBy = Setting("Date") + #---------- END: End of the section of code borrowed from SciHub.py ---------- + + + def __init__(self, *args, **kwargs): + """ + Initializing the widget with GUI components and internal state. + + Part of the GUI layout and URL management logic is adapted from: + https://github.com/sarahperettipoix/orange3-textable-prototypes/blob/master/orangecontrib/textable_prototypes/widgets/SciHubatorTest.py + """ + super().__init__(*args, **kwargs) + #---------- START: The following section of code has been borrowed from SciHub.py ---------- + # (https://github.com/sarahperettipoix/orange3-textable-prototypes/blob/ + # master/orangecontrib/textable_prototypes/widgets/SciHubatorTest.py) + + # self.URLLabel = list() + # self.selectedURLLabel = list() + # self.new_url = u'' + # self.extractedText = u'' + # self.DOI = u'' + # self.DOIs = list() + #---------- END: End of the section of code borrowed from SciHub.py ---------- + + # Attributes... + self.inputSegmentationLength = 0 + + # This attribute stores scraped comments to prevent duplicate + # queries and make the widget both faster and less resource-intensive. + # Comments are stored as follows: + # 'url': list of comments on url + self.cached_comments = {} + + # This attribute stores a per-widget number of comments desired as + # output. This can be changed by the user at any time via the GUI. + # n_desired_comments = 1 + # The following attribute is required by every widget + # that imports new strings into Textable. + self.createdInputs = list() + + self.infoBox = InfoBox(widget=self.controlArea) + self.sendButton = SendButton( + widget=self.controlArea, + master=self, + callback=self.sendData, + cancelCallback=self.cancel_manually, + infoBoxAttribute="infoBox", + ) + #---------- START: The following section of code has been borrowed from SciHub.py ---------- + # (https://github.com/sarahperettipoix/orange3-textable-prototypes/blob/master/orangecontrib + # /textable_prototypes/widgets/SciHubatorTest.py) + + # URL box + URLBox = gui.widgetBox( + widget=self.controlArea, + box=u'Sources', + orientation='vertical', + addSpace=False, + ) + URLBoxLine1 = gui.widgetBox( + widget=URLBox, + box=False, + orientation='horizontal', + addSpace=True, + ) + self.fileListbox = gui.listBox( + widget=URLBoxLine1, + master=self, + value='selectedURLLabel', + labels='URLLabel', + callback=self.updateURLBoxButtons, + tooltip=( + u"The list of URLs whose comments will be imported.\n" + ), + ) + URLBoxCol2 = gui.widgetBox( + widget=URLBoxLine1, + orientation='vertical', + ) + self.removeButton = gui.button( + widget=URLBoxCol2, + master=self, + label=u'Remove', + callback=self.remove, + tooltip=( + u"Remove the selected URL from the list." + ), + disabled = True, + ) + self.clearAllButton = gui.button( + widget=URLBoxCol2, + master=self, + label=u'Clear All', + callback=self.clearAll, + tooltip=( + u"Remove all URLs from the list." + ), + disabled = True, + ) + URLBoxLine2 = gui.widgetBox( + widget=URLBox, + box=False, + orientation='vertical', + ) + # Add URL box + addURLBox = gui.widgetBox( + widget=URLBoxLine2, + box=True, + orientation='vertical', + addSpace=False, + ) + gui.lineEdit( + widget=addURLBox, + master=self, + value='new_url', + orientation='horizontal', + label=u'URL(s):', + labelWidth=101, + callback=self.updateURLBoxButtons, + tooltip=( + u"The URL(s) that will be added to the list when\n" + u"button 'Add' is clicked.\n\n" + u"Successive URLs must be separated with ' , ' \n" + u"Their order in the list will be the same as\n" + u"in this field." + ), + ) + advOptionsBox = gui.widgetBox( + widget=self.controlArea, + box=u'More Options', + orientation='vertical', + addSpace=False, + ) + self.optionLine1 = gui.widgetBox( + widget=advOptionsBox, + orientation='horizontal', + addSpace=False, + ) + commentsSelector = gui.comboBox( + widget=advOptionsBox, + master=self, + orientation='horizontal', + value='n_desired_comments', + label='Select number of comments:', + tooltip='Default 0 is all comments.', + items=[1, 5, 10, 100, 1000, 10000, "No limit"], + sendSelectedValue=True, + labelWidth=220, + ) + + self.sortByFilter = gui.widgetBox( + widget=advOptionsBox, + orientation='horizontal', + addSpace=False, + ) + + sortBy = gui.comboBox( + widget=self.sortByFilter, + master=self, + value='sortBy', + label=u'Sort by:', + tooltip= "Choose how the comment will be sorted", + orientation='horizontal', + sendSelectedValue=True, + items=["Date", "Popularity"], + labelWidth=220, + ) + + gui.separator(widget=addURLBox, height=3) + self.addButton = gui.button( + widget=addURLBox, + master=self, + label=u'Add', + callback=self.add, + tooltip=( + u"Add the URL currently displayed in the 'URL'\n" + u"text field to the list." + ), + disabled = True, + ) + gui.rubber(self.controlArea) + # So that the "Add" button is not gray + self.updateURLBoxButtons() + self.sendButton.draw() + self.infoBox.draw() + self.sendButton.sendIf() + + #---------- END: End of the section of code borrowed from SciHub.py ---------- + + # Allows to save settings when quit and reload + self.fileListbox.update() + URLBoxLine1.update() + self.DOIs = list(set(self.DOIs)) + self.URLLabel = self.DOIs + self.selectedURLLabel = self.DOIs + commentsSelector.update() + self.n_desired_comments = self.n_desired_comments + self.sortBy = self.sortBy + self.sendButton.settingsChanged() + + def sendData(self): + """ + Perform every required check and operation + before calling the method that does the actual + processing. + """ + # If the widget creates new LTTL.Input objects (i.e. + # if it imports new strings in Textable), make sure to + # clear previously created Inputs with this method. + self.clearCreatedInputs() + + # Notify processing in infobox. Typically, there should + # always be a "processing" step, with optional "pre- + # processing" and "post-processing" steps before and + # after it. If there are no optional steps, notify + # "Preprocessing...". + self.infoBox.setText("Step 1/2: Processing...", "warning") + # Progress bar should be initialized at this point. + self.progressBarInit() + # Create a threaded function to do the actual processing + # and specify its arguments (here there are none). + threaded_function = partial( + self.processData, + # argument1, + # argument2, + # ... + ) + # Run the threaded function... + self.threading(threaded_function) + + + + def processData(self): + """ + Actual processing takes place in this method, + which is run in a worker thread so that GUI stays + responsive and operations can be cancelled. + """ + # At start of processing, set progress bar to 1%. + # Within this method, this is done using the following + # instruction. + self.signal_prog.emit(1, False) + urls = [self.url] + # Indicate the total number of iterations that the + # progress bar will go through (e.g. number of input + # segments, number of selected files, etc.), then + # set current iteration to 1. + # number of segment ça veut dire number of url + max_itr = len(urls) + cur_itr = 1 + urls = self.DOIs + all_comments = [] + + # Actual processing... + # For each progress bar iteration... + #for _ in range(int(self.numberOfSegments)): + + for url in urls: + + # Update progress bar manually... + self.signal_prog.emit(int(100*cur_itr/max_itr), False) + cur_itr += 1 + + # If int(self.numberOfSegments) == 1: + if len(urls) == 1: + # self.captionTitle is the name of the widget, + # which will become the label of the output + # segmentation. + label = self.captionTitle + else: + label = None # will be set later. + + if url in self.cached_comments: + # print(f'▓ using the cache') + comments_ycd = self.cached_comments.get(url) + # print(f'▓ found {len(comments_ycd)} comments') + else: + # print(f'▓ not using the cache') + comments_ycd = self.scrape(url) + # print(f'▓ found {len(comments_ycd)} comments') + self.cached_comments[url] = comments_ycd + # print(f'▓ saved {len(self.cached_comments[url])} comments') + # print('▓▓————————▓▓ cache check happened! ▓▓————————▓▓') + + all_comments.extend(comments_ycd) + + # If "no limit" is selected, its value will + # be 1 million + if self.n_desired_comments == "No limit": + limit = 10000000 + else : + limit = int(self.n_desired_comments) + + if limit != 0: + comments_ycd = all_comments + comments_ycd = comments_ycd[0:limit] + if self.sortBy == "Date": + sorted_comments = sorted( + comments_ycd, + key=lambda c: parse_date_safe(c["time"]) or datetime.max, + reverse=False # False = oldest first + ) + + elif self.sortBy == "Popularity": + sorted_comments = sorted(comments_ycd, key=lambda x: int(x["votes"]), reverse=True) + + for chose in sorted_comments: + myInput = Input(str(chose["text"]), label) + + segment = myInput[0] + segment.annotations["author"] = str(chose["author"]) + segment.annotations["url"] = url + segment.annotations["likes"] = str(chose["votes"]) + segment.annotations["time"] = str(chose["time"]) + myInput[0] = segment + + self.createdInputs.append(myInput) + + time.sleep(0.00001) + if self.cancel_operation: + self.signal_prog.emit(100, False) + return + + # Update infobox and reset progress bar... + self.signal_text.emit("Step 2/2: Post-processing...", + "warning") + self.signal_prog.emit(1, True) + + # If there's only one LTTL.Input created, it is the + # widget's output... + # if len(urls) == 1: + if len(self.createdInputs) == 1: + return self.createdInputs[0] + + return Segmenter.concatenate( + caller=self, + segmentations=self.createdInputs, + label=self.captionTitle, + import_labels_as=None, + ) + + @OWTextableBaseWidget.task_decorator + def task_finished(self, f): + """ + All operations following the successful termination + of self.processData + """ + # Get the result value of self.processData. + processed_data = f.result() + + # If it is not None... + if processed_data: + message = f"{len(processed_data)} segment@p sent to output " + message = pluralize(message, len(processed_data)) + numChars = 0 + for segment in processed_data: + segmentLength = len(Segmentation.get_data(segment.str_index)) + numChars += segmentLength + message += f"({numChars} character@p)." + message = pluralize(message, numChars) + self.infoBox.setText(message) + self.send("New segmentation", processed_data) + + # The following method should be copied verbatim in + # every Textable widget. + def setCaption(self, title): + """ + Register captionTitle changes and send if needed + """ + if 'captionTitle' in dir(self): + changed = title != self.captionTitle + super().setCaption(title) + if changed: + self.cancel() # Cancel current operation + self.sendButton.settingsChanged() + else: + super().setCaption(title) + + # The following two methods should be copied verbatim in + # every Textable widget that creates LTTL.Input objects. + + def clearCreatedInputs(self): + """ + Clear created inputs + """ + # List of inputs/URLs + for i in self.createdInputs: + # Database: clearing ID of URL to clear, set value to None (= erase data) + Segmentation.set_data(i[0].str_index, None) + # GUI: clears contents of list + del self.createdInputs[:] + + def onDeleteWidget(self): + """ + Clear created inputs on widget deletion + """ + self.clearCreatedInputs() + + def youtube_video_existe(self, urll): + """ + This function tests the Internet connection. + """ + # print( + # f'▓▓▓▓▓▓▓▓▓▓▓▓ youtube_video_existe(urll)\n' + # f'▓ youtube_video_existe() —— urll={urll}' + # ) + # Mimicking a browser so there is no blockage when requesting a URL + headers = { + "User-Agent": "Mozilla/5.0" + } + try: + response = requests.get(urll, headers=headers, timeout=5) + # print(f'▓ youtube_video_existe() —— headers test: {response}') + # print('▓ youtube_video_existe() —— work done :) returning.') + # print('▓▓▓▓▓▓▓▓▓▓▓▓ scrape() ▓▓▓▓▓▓▓▓▓▓▓▓') + return response.status_code + except requests.RequestException: + # print(f'▓ youtube_video_existe() —— headers errors') + # print('▓ youtube_video_existe() —— work done :) returning.') + # print('▓▓▓▓▓▓▓▓▓▓▓▓ scrape() ▓▓▓▓▓▓▓▓▓▓▓▓') + return False + + def scrape(self, url) -> list: + """ + Sets up a virtual browser through YoutubeCommentDownloader and uses + it to scrape all comments on a given url, returning them as a list. + """ + # print( + # f'▓▓▓▓▓▓▓▓▓▓▓▓ scrape(url)' + # f'▓ scrape() —— url={url}' + # ) + + # Fetch the comments + downloader = YoutubeCommentDownloader() + comments = downloader.get_comments_from_url(url,language='en') + every_comment = [x for x in comments] + # Prints number of comments found + # print( + # f'▓ scrape() —— returning {len(every_comment)} comment(s)' + # ) + # print('▓ scrape() —— work done :) returning.') + # print('▓▓▓▓▓▓▓▓▓▓▓▓ scrape() ▓▓▓▓▓▓▓▓▓▓▓▓') + # Returns the list of all comments collected + return every_comment + #---------- START: The following section of code has been borrowed from SciHub.py ---------- + # (https://github.com/sarahperettipoix/orange3-textable-prototypes/ + # blob/master/orangecontrib/textable_prototypes/widgets/SciHubatorTest.py) + + def clearAll(self): + """ + Remove all DOIs from DOIs attr + """ + del self.DOIs[:] + self.selectedURLLabel = [] + self.sendButton.settingsChanged() + self.URLLabel = self.URLLabel + self.clearAllButton.setDisabled(True) + + def remove(self): + """ + Remove URL from DOIs attr + """ + if self.selectedURLLabel: + index = self.selectedURLLabel[0] + self.DOIs.pop(index) + del self.selectedURLLabel[:] + self.sendButton.settingsChanged() + self.URLLabel = self.URLLabel + self.clearAllButton.setDisabled(not bool(self.URLLabel)) + + def add(self): + """ + Add Urls to URLs attr + DOIList = re.split(r',', self.new_url) + """ + # String of comma-separated URLs (url1, url2, ...) + # re.split(r'\s*,\s*') splits strong on commas, allows whtiespace + DOIList = [url.strip() for url in re.split(r'\s*,\s*', self.new_url)] + + # Saves list of added URLs + old_urls = list(self.DOIs) + + if DOIList: + # Create set to delete all duplicate URLs + tempSet = DOIList + def_set = set(tempSet) + # Warnings + # Invalid format + not_an_url = False + # Video does not exist + not_available = False + # Duplicate + doublon = False + # Numbers of each problem + nombre_de_problemes_not_url = 0 + nombre_de_problemes_not_available = 0 + nombre_de_problemes_doublon = 0 + indexx = 0 + list_indexx = [] + + # Loop over each new URL to validate it + for single_url in tempSet: + list_indexx.append(True) + for past_url in old_urls: + # Mark as duplicate if it already exists in old_urls + if single_url == past_url: + doublon = True + # print("il y a un doublon ici") + list_indexx[indexx] = False + nombre_de_problemes_doublon += 1 + + # If 1 or more URL(s) in a list are not in the form + # of a URL from Youtube, the URL will not be added + # Regex to only accept YouTube URL format + # -- With the help of ChatGPT + # ("https://chatgpt.com/share/6800c404-cb74-8000-afef-e321b9517c47") -- + if not re.match(r"^(https?\:\/\/)?(www\.)?(youtube\.com|youtu\.be)\/.+$", + single_url): + not_an_url = True + # Each element is True or False depending + # on whether the URL passed all checks + if list_indexx[indexx] != False: + list_indexx[indexx] = False + nombre_de_problemes_not_url += 1 + + # Check if the URL exists + elif not youtube_video_exists(single_url): + not_available = True + # Each element is True or False depending on whether the URL passed all checks + if list_indexx[indexx] != False: + list_indexx[indexx] = False + nombre_de_problemes_not_available += 1 + + # Check that the URL is not a duplicate and is available + if doublon == False and not_an_url == False and not_available == False: + # print("la ou les url sont clean") + list_indexx[indexx] = True + # If this is the case, URL is added to the list indexx + indexx += 1 + + # If an URL is a duplicate, then there is an error message + if doublon == True: + QMessageBox.information( + # The error message gives the numbers of duplicates found + None, "YouGet", + f"Error Message:

" + f"{nombre_de_problemes_doublon} duplicate URL(s) found and deleted.", + QMessageBox.Ok + ) + + # If a URL is not available or misspelled + if not_available == True: + QMessageBox.information( + # The error message gives the numbers of non available URLs found + None, "YouGet", + f"Error Message:

" + f"{nombre_de_problemes_not_available} URL(s) are not valid YouTube videos", + QMessageBox.Ok + ) + + # If a URL is not a URL + if not_an_url == True: + QMessageBox.information( + # The error message gives the numbers of non URLs found + None, "YouGet", + f"Warning Message:

" + f"{nombre_de_problemes_not_url} element(s) are not " + f"YouTube URLs or please check your internet connection.", + QMessageBox.Ok + ) + + # Removes the False URL(s) and keeps the rest + temp_set_liste = list(tempSet) + filtered_list = [] + for i, keep in enumerate(list_indexx): + if keep: + filtered_list.append(temp_set_liste[i]) + # Only URL(s) that pass all checks are kept and added to self.DOIs + self.DOIs += list(filtered_list) + self.DOIs = list(set(self.DOIs)) + self.URLLabel = self.DOIs + self.selectedURLLabel = self.DOIs + self.n_desired_comments = self.n_desired_comments + + + self.URLLabel = self.URLLabel + # Update on buttons + # Disable "Clear All" button if there are no URL(s) + self.clearAllButton.setDisabled(not bool(self.DOIs)) + # Trigger settings changed for the send button + self.sendButton.settingsChanged() + + def addDisabledOrNot(self): + """ + Disables the add button if no new URL is entered + """ + self.addButton.setDisabled(not bool(self.new_url)) + + def updateURLBoxButtons(self): + """ + Update state of the "Add" and "Remove" buttons + """ + self.addButton.setDisabled(not bool(self.new_url)) + self.removeButton.setDisabled(not bool(self.selectedURLLabel)) + + #---------- END: End of the section of code borrowed from SciHub.py ---------- + + def updateGUI(self): + """ + This method is intended to refresh or modify GUI elements based on the current + internal state or user interactions. + """ + pass + +if __name__ == '__main__': + WidgetPreview(YouGet).run() + + +def youtube_video_exists(url): + """ + This function checks whether a YouTube video exists and is playable at a given URL. + """ + # Mimicking a browser so there is no blockage when requesting a URL + headers = { + "User-Agent": "Mozilla/5.0" + } + + # Sending an HTTP GET request to the YouTube video URL + # Grab data with the help of the internet + try: + response = requests.get(url, headers=headers) + # Check for successful response + # 200 means success, != 200 means failure + if response.status_code != 200: + return False + + # Extract the YouTube video content content + html = response.text + + # Extraction du JSON "ytInitialPlayerResponse" + initial_data_match = re.search(r'ytInitialPlayerResponse\s*=\s*({.+?});', html) + # print(initial_data_match) + + # If nothing is found, return False and print Error + if not initial_data_match: + # print("Impossible d'extraire ytInitialPlayerResponse") + return False + + # Parse extracted JSON into a Python dict + data = json.loads(initial_data_match.group(1)) + # print(data) + # Check playability status + status = data.get("playabilityStatus", {}).get("status", "UNKNOWN") + + # Indicate the URL's playability status + if status == "OK": + # "OK" means that the URL is available + return True + else: + # If video not playable, return False + # print(f"Statut de lecture : {status}") + return False + + # Catch errors during the request + except Exception as e: + return False + + +def clean_date_str(date_str): + # Takes off mention "(edited)" and the spaces around it + return date_str.replace("(edited)", "").strip() + +def parse_date_safe(date_str): + # Parsing date + cleaned = clean_date_str(date_str) + dt = dateparser.parse(cleaned) + return dt diff --git a/orangecontrib/textable_prototypes/widgets/icons/YouGet.svg b/orangecontrib/textable_prototypes/widgets/icons/YouGet.svg new file mode 100644 index 00000000..0a925e1c --- /dev/null +++ b/orangecontrib/textable_prototypes/widgets/icons/YouGet.svg @@ -0,0 +1,4 @@ + + + + \ No newline at end of file diff --git a/orangecontrib/textable_prototypes/widgets/test.py b/orangecontrib/textable_prototypes/widgets/test.py new file mode 100644 index 00000000..4ddcf09e --- /dev/null +++ b/orangecontrib/textable_prototypes/widgets/test.py @@ -0,0 +1,9 @@ +from itertools import islice +from youtube_comment_downloader import * +downloader = YoutubeCommentDownloader() +comments = downloader.get_comments_from_url('https://www.youtube.com/watch?v=ScMzIvxBSi4', sort_by=SORT_BY_POPULAR) +#for comment in islice(comments, 10): +# print(comment) + +newlist = [x for x in comments] +print(newlist) \ No newline at end of file diff --git a/specs/YouGet.rst b/specs/YouGet.rst new file mode 100644 index 00000000..fc2e2fed --- /dev/null +++ b/specs/YouGet.rst @@ -0,0 +1,117 @@ +################################# +Spécification widget YouGet +################################# + +1 Introduction +************** + +1.1 But du projet +================= +Créer un widget pour Orange Textable permettant de télécharger les commentaires d'une vidéo à partir d'une URL YouTube. + +1.2 Aperçu des étapes +===================== +* Première version de la spécification: 13.03.2025 +* Remise de la spécification: 20.03.2025 +* Version alpha du projet: 17.04.2025 +* Remise et présentation du projet: 22.05.2025 + +1.3 Équipe et responsabilitées +============================== +* Mainteneur : + - Aris Xanthos (aris.xanthos@unil.ch) + +* Virgile Albasini (`virgile.albasini@unil.ch`_): + +.. _virgile.albasini@unil.ch: mailto:virgile.albasini@unil.ch + + - Spécification + - Code + - Documentation + - Maquettes + +* Sophie Ward (`sophie.ward@unil.ch`_): + +.. _sophie.ward@unil.ch: mailto:sophie.ward@unil.ch + + - Spécification + - Code + - Documentation + - GitHub + +* Lorelei Chevroulet(`lorelei.chevroulet@unil.ch`_): + +.. _lorelei.chevroulet@unil.ch: mailto:lorelei.chevroulet@unil.ch + + - Spécification + - Code + - Maquettes + - Vérification orthographique + +* Vincent Joris (`vincent.joris@unil.ch`_): + +.. _vincent.joris@unil.ch: mailto:vincent.joris@unil.ch + + - Spécification + - Code + - Interface + - Tests + +2. Technique +************ + +2.1 Dépendances +=============== + +* Orange 3.38.1 ou supérieur. + +* Orange Textable 3.2.2 ou supérieur. + +2.2 Fonctionnalités minimales +============================= + +* Input : pas d'input. + +* Entrer une URL d'une vidéo YouTube pour télécharger les commentaires. +* Output : les commentaires sont sous forme de segmentation. + +.. image:: images/youget_minimal.png + +2.3 Fonctionnalités principales +=============================== + +* Fonctionnalités minimales +* Pouvoir importer une liste d'url +* Choisir le nombre de commentaires en output (minimum 1 commentaire, puis 100, puis 1000, ou un nombre de commentaires illimité). + + +.. image:: images/youget_principal.png + +2.4 Fonctionnalités optionnelles +================================ + +* Avoir en output les commentaires triés par likes ou par leur date. + +2.5 Tests +========= + +* Vérifier que les commentaires soient exportés. + +3. Etapes +********* + +3.1 Version alpha +================= +* L'interface graphique est complétement construite. +* Le téléchargement des commentaires des vidéos YouTube sont fonctionnels. + +3.2 Remise et présentation +========================== +* La documentation du logiciel est complète. +* Les fonctionnalités principales sont complétement prises en charge par le logiciel. + + +4. Infrastructure +================= +Le projet est disponible sur GitHub à l'adresse `https://github.com/axanthos/TextablePrototypes.git +`_ diff --git a/specs/images/youget_minimal.png b/specs/images/youget_minimal.png new file mode 100644 index 00000000..438c1aad Binary files /dev/null and b/specs/images/youget_minimal.png differ diff --git a/specs/images/youget_principal.png b/specs/images/youget_principal.png new file mode 100644 index 00000000..b9fac523 Binary files /dev/null and b/specs/images/youget_principal.png differ