diff --git a/README.md b/README.md index 79091a8..d8c9442 100644 --- a/README.md +++ b/README.md @@ -1,92 +1,62 @@ # Reddit NodeJS API -In this project, we're going to build a tiny Reddit API. But wait, don't we need to learn about -web servers, HTTP, REST and all these buzzwords before building an API?? Not really! +In this project, we're going to create a set of functions -- an API -- that will be used to build a full Reddit clone next week. The functions that we create will be doing what is commonly called "CRUD". -We won't be building a *Web* API yet, only a set of NodeJS callback-receiving functions that will -do the dirty work of taking our data and putting it in our database. +CRUD stands for "Create, Read, Update, Delete". It's the basic set of operations that we normally want to do on the entities of our application. For example, when you initially go to Reddit, one thing you might want to do is signup. Signing up should add a new entry to the `users` table in Reddit's database. In the code of `reddit.js`, this is the role of the `createUser` function: take a simple object with user's properties, make database queries to create the user, and return the `id` of the newly created user since it's the database that creates it. -The project has already been initialized with a few files to get you started: +We won't be building a *Web* API yet. This will be the topic of next week, where we will explore using the ExpressJS framework to create a web server with NodeJS. Next week, our web server code will use the code that we are creating in this workshop to talk to the database indirectly. This is preferable to making sql queries directly inside our web server code. If we ever make changes to our SQL schema or rules, we only have to change `reddit.js` and not the code that uses it. -* `reddit.sql`: This file contains `CREATE TABLE` statements for users and posts -* `reddit.js`: This file contains the actual Reddit API. The only things that should -be in there are functions that talk to our database -* `index.js`: This is the starting point of our application. +The project has already been initialized with a few files to get you started. -In the next section, we will review in detail the contents of `reddit.js` and `index.js`. +* `reddit.sql`: This file contains `CREATE TABLE` statements for the `users` and `posts` tables to get you started. You should add any further `CREATE TABLE`s from the workshop in this file. +* `reddit.js`: This file contains the actual Reddit API. The only things that should be in there are functions that talk to the database. +* `demo-using-api.js`: Basically, this file is a scratchpad. It is an **example** of how to use the Reddit API. First we have to create a database connection. Then use `new Reddit()` and pass it the connection. This will return an API object that has all the CRUD functions we need -- `createUser`, `createPost`, `getAllPosts`. We then show an example of using the API to create a new user. Once we receive the new user's ID, we use it to create a new post under that user. Finally, we print the new post ID in the console. As you are implementing the new functionality for this workshop, you can test your functions by calling them inside `demo-using-api.js` after commenting out the already existing code. + +In the next section, we will review in detail the contents of `reddit.js` and `demo-using-api.js`. ## Initial files ### `reddit.sql` -This file contains two `CREATE TABLE` statements, which we are already used to. You may notice there is -something weird going on with the `createdAt` and `updatedAt` fields: basically we want these fields -to be automatically set by MySQL, but MySQL only allows **one** `CURRENT_TIMESTAMP` field. So what we do -is set the `createdAt` to `DEFAULT 0`, but later on in `reddit.js` we set it to `NULL`. Since `NULL` is -not an acceptable value here, MySQL will make it default to the current timestamp, which is what we wanted. - -### `index.js` -This is the "main" file of our application, meaning it's the one we will run to get things done. -First, we load the `mysql` NodeJS library. This library will let us communicate with MySQL similarly -to what we have been doing with the command-line: writing queries and getting responses. - -Next, we create a connection to a MySQL server. On Cloud9, our database does not need to be 100% secure, -so we are setup with a MySQL server that we can login to with our Cloud9 username and no password. The -connection is using the `reddit` database, which we haven't created yet. - -After that, we load the `./reddit` module, which exports a single function: the reason for this is -that all the functions of our API require a database connection. Instead of establishing the database -connection inside the `reddit.js` file, we choose to keep it pure: we pass the connection to the -function, and it returns to us the actual API. This is a common pattern in development. This is a good -time to make sure you understand what is going on, and ask questions if not. - -Finally, after having initialized the API, we can start doing ad-hoc requests to it. In the current -example, we are creating a new user, then using its `id` to create a new post. +This file contains `CREATE TABLE` statements for the `users` and `posts` tables to get you started. You should add any further `CREATE TABLE`s from the workshop in this file. + +### `demo-using-api.js` +This will look like the "main" file of our application, meaning it's the one we will run to get things done. First, we load the `mysql` NodeJS library. This library will let us communicate with MySQL similarly to what we have been doing with the command-line: writing queries and getting responses. + +Next, we create a connection to a MySQL server. On Cloud9, our database does not need to be 100% secure, so we are setup with a MySQL server that we can login to with our Cloud9 username and no password. The connection is using the `reddit` database, which we haven't created yet. + +After that, we load the `./reddit` module, which exports a single function: the reason for this is that all the functions of our API require a database connection. Instead of establishing the database connection inside the `reddit.js` file, we choose to keep it pure: we pass the connection to the function, and it returns to us the actual API. This is a common pattern in development. This is a good time to make sure you understand what is going on, and ask questions if not. + +Finally, after having initialized the API, we can start doing ad-hoc requests to it. In the current example, we are creating a new user, then using its `id` to create a new post. ### `reddit.js` -This is the core file of the project. Even though it only exports one function, this function is only -there for the purpose of accepting a database connection object. Once it receives a connection, it -returns the actual API of our Reddit clone. +This is the core file of the project. It exports a class `RedditAPI`. We can call the constructor by providing a `node-mysql` database connection object. The object will have the following functions: #### `createUser` -The first function we see being exported is `createUser`. It takes an object of user properties, and a callback. -This is required, because the mysql library we are using is also callback-based. Since we are getting -our result in a callback, we also have to accept a callback to pass the final result to. +The first function we see being exported is `createUser`. It takes an object of user properties, and returns a Promise. As we can see from the code of the function, the `user` argument needs to have a `username` and a `password`. + +The first thing our `createUser` function does is "hash" the user's password using the `bcrypt` library. This step is necessary to protect our users' information. **It is computationaly infeasible to recover the actual password from the hash**. The way this works is that when the user logs in, we hash the provided password using the same function. If both hashes match, then it's a success. Otherwise we can safely say it's the wrong password. We will be looking at hashing in more detail next week. -The first thing our `createUser` function does is "hash" the user's password using the `bcrypt` library. This step -is necessary to protect our users' information. **It is computationaly infeasible to recover the actual password -from the hash**. The way this works is that when the user logs in, we hash the provided password using the -same function. If both hashes match, then it's a success. Otherwise we can safely say it's the wrong password. +Once the hashing is completed, we get back the hashed password through the `Promise`. We use the hashed password to do an `INSERT` in our database. -Once the hash is completed, we get back the hashed password in our callback. We use the hashed password -to do an `INSERT` in our database. Here we are using the `createdAt` "hack": basically we are setting -the column to `NULL`, and MySQL will set it to the current timestamp. The `updatedAt` column is fully -managed by MySQL: it will set it when we create the user, and update it whenever we make a change to that user. +Another thing you will notice is the `?`s in the SQL query. These placeholders are **super important**. First off, they make it so that we don't have to concatenate strings together for every parameter in the query. But more importantly, they will make sure to **properly escape** any string we give to them. To make this work, we pass the `conn.query` function an array of the strings that should replace the `?`s, and it puts the query together for us. -Another thing you will notice is the `?`s in the SQL query. These placeholders are **super important**. First off, -they make it so that we don't have to concatenate strings together to infinity. But more importantly, -they will make sure to **properly escape** any string we give to them. To make this work, we pass the -`conn.query` function an array of the strings that should replace the `?`s, and it puts the query together for us. +When this function succeeds, it calls back with the new user ID. #### `createPost` -This works similarly to the `createUser` function, except we don't need the password hashing step. +This works similarly to the `createUser` function, except we don't need the password hashing step. The `post` argument needs to have a `userId`, `title` and `url`. + +When this function succeeds, it calls back with the new post ID. #### `getAllPosts` -This function is different from the previous two in that it doesn't add any data to our system. It uses -a regular `SELECT` to retrieve all the posts. The function requires an `options` object, which for the moment -can contain a `numPerPage` and a `page`. These will be translated to a `LIMIT` and `OFFSET` to allow -for pagination of our posts. +This function is different from the previous two in that it doesn't add any data to our system. It uses a regular `SELECT` to retrieve all the posts, with a `LIMIT` of 25. Eventually, pagination can be added to retrieve the next set of 25 results, or change the size of each page. + +When this function succeeds, it calls back with an array of objects. Each object represents a post in the database. In this initial version, the posts are always ordered by `createdAt DESC`, which means newer posts will appear at the top of the result set. -## Your work -Your work will consist in incrementally adding features to the API to make it more complete. Most of the -features are independent, so they can be worked on separately. If you want to work on more than one -feature at a time, it would be a good idea to create a branch for each feature until you are ready -to merge it to master ;) +## Building out the API +Your work will consist in incrementally adding features to the API to make it more complete. Most of the features are independent, so they can be worked on separately. If you want to work on more than one feature at a time, it would be a good idea to create a branch for each feature until you are ready to merge it to master. ### Improve the `getAllPosts` function -At the moment, the `getAllPosts` function is returning an array of posts. The problem is that it's -hard to figure out the username associated to each post. Since our database schema is somewhat -**normalized**, the posts table only contains a **reference** to the users table, through the `userId` -column. Your job is to improve this function by returning the user associated with each post. +At the moment, the `getAllPosts` function is returning an array of posts, which is exactly what we want. The problem is that it's hard to figure out the username associated to each post. Since our database schema is **normalized**, the posts table only contains a **reference** to the users table, through the `userId` column. Your job is to improve this function by returning the user associated with each post. For example, instead of returning: @@ -102,7 +72,9 @@ For example, instead of returning: } ] ``` + You should return: + ```json [ { @@ -111,9 +83,8 @@ You should return: "url": "https://www.noob.com", "createdAt": "...", "updatedAt": "...", - "userId": 1, "user": { - "id": 12, + "id": 1, "username": "n00bster", "createdAt": "...", "updatedAt": "..." @@ -122,82 +93,162 @@ You should return: ] ``` -You can achieve this by completing the current query and adding a `JOIN` to it. - -### Add a `getAllPostsForUser(userId, options, callback)` function -The function `getAllPosts` returns all the posts for all the users in the system (with a limit). Here, -we want to return only the posts for one userId. It will be quite similar to the `getAllPosts` function, -except that it will take an additional `userId` parameter. Your function should also use the `numPerPage` and `page` options to provide a paginated result set. +You can achieve this by adding a `JOIN` to the existing query, then **transforming** the flat array you get back using a `map`. +**Note**: MySQL cannot return a nested dataset. The result of your join will still be an array of flat objects. you will have to use the `map` method to return the data in the expected format. -### Add a `getSinglePost(postId, callback)` function -Currently there is no way to retrieve a single post by its ID. This would be important for eventually -displaying this data on a webpage. Create this function, and make it return a **single post**, without array. - - -### Add subreddits functionality -This feature will be more complicated to implement, because it will require not only adding new functions, -but also modifying existing ones. +### Add a subreddits functionality +This feature will be more complicated to implement, because it will require not only adding new functions, but also modifying existing ones. #### Step 1: -The first step will be to create a `subreddits` table. Each subreddit should have a unique, auto incrementing -`id`, a `name` anywhere from 1 to 30 characters, and an optional description of up to 200 characters. Each sub -should also have `createdAt` and `updatedAt` timestamps that you can copy from an existing table. To guarantee -the integrity of our data, we should make sure that the `name` column is **unique**. +The first step will be to create a `subreddits` table. Each subreddit should have a unique, auto incrementing `id` that is also the primary key, a `name` anywhere from 1 to 30 characters, and an optional `description` of up to 200 characters. Each sub should also have `createdAt` and `updatedAt` `DATETIME` fields. To guarantee the integrity of our data, we should make sure that the `name` column is **unique** by using a `UNIQUE INDEX`. There should never be two subreddits with the same `name`. -Once you figure out the correct `CREATE TABLE` statement, add it to `reddit.sql` with a comment. +Once you write the correct `CREATE TABLE` statement, add it to `reddit.sql` and execute it to create the table. #### Step 2: -Then we need to add a `subredditId` column to the posts table, with associated foreign key. Once you figure -out the correct `ALTER TABLE` statement, make sure to add it to `reddit.sql` with a comment. +Then we need to add a `subredditId` column to the posts table, because now each post will be related to a subreddit. There are two ways to do this: either with an `ALTER TABLE`, or by `DROP`ping and recreating the table. Here we will choose to do it with an `ALTER TABLE`. + +Write an `ALTER TABLE` query that will add a `subredditId INT` column to the `posts` table, as well as a foreign key constraining its values to valid `id`s in the `subreddits` table you just created. + +Once you write the correct `ALTER TABLE` statement, add it to `reddit.sql` and execute it to modify the table. #### Step 3: -In the `reddit.js` API, add a `createSubreddit(sub, callback)` function. It should take a subreddit object which -contains a `name` and optional `description` property. It should insert the new subreddit, and either -return an error or the newly created subreddit. You can take some inspiration from the `createPost` function -which operates in a similar way :) +In the `reddit.js` API, add a `createSubreddit(subreddit)` function. It should take a subreddit object that contains a `name` and `description`. It should insert the new subreddit, and return the ID of the new subreddit. You can take some inspiration from the `createUser` function which operates in a similar way. Just like for `createUser`, you'll have to check if you get a "duplicate entry" error and send a more specific error message. #### Step 4: -In the `reddit.js` API, add a `getAllSubreddits(callback)` function. It should return the list of all -subreddits, ordered by the newly created one first. +In the `reddit.js` API, add a `getAllSubreddits()` function. It should return the list of all subreddits, ordered by the newly created ones first, as a `Promise`. #### Step 5: -In the `reddit.js` API, modify the `createPost` function to take a `subredditId` parameter and use it. +In the `reddit.js` API, modify the `createPost` function to look for a `subredditId` property in the `post` object and use it. `createPost` should return an error if `subredditId` is not provided. #### Step 6 -In the `reddit.js` API, modify the `getAllPosts` function to return the **full subreddit** associated with each post. -You will have to do an extra `JOIN` to accomplish this. +In the `reddit.js` API, modify the `getAllPosts` function to return the **full subreddit** associated with each post. You will have to do an extra `JOIN` to accomplish this. +### Add the voting system for posts +Reddit wouldn't be what it is without its voting system. The mix of up/down votes and good scoring functions makes it possible to view the world of Reddit from all kinds of points of view. -### Add comments functionality -This feature will be complicated to implement because it will require not only adding new code, but also -modifying existing code and databases. For this reason the steps are outlined in detail. +To make the rest of the instructions clearer, let's define some terms that are proper to us and that describe the vote parameters and scoring functions. Note that the scoring functions are made for simplicity and not accuracy. + +* **`numUpvotes`**: The number of upvotes for a given post +* **`numDownvotes`**: The number of downvotes for a given post +* **`totalVotes`**: `= numUpvotes + numDownvotes` +* **`voteScore`**: `= numUpvotes - numDownvotes` +* **Top ranking**: `= voteScore` +* **Hotness ranking**: `= voteScore / (amount of time the post has been online in seconds.)` -- if two posts have a `voteScore` of 100, the post that has been online the least amount of time is hotter because it received those votes faster. +* **Newest ranking**: `= createdAt` #### Step 1: -The first step will be to create a `comments` table. Each comment should have a unique, auto incrementing -`id` and a `text` anywhere from 1 to 10000 characters. It should also have `createdAt` and `updatedAt` -timestamps that you can copy from an existing table. Each comment should also have a `userId` linking -it to the user who created the comment (using a foreign key), a `postId` linking it to the post which is -being commented on, and a `parentId` linking it to the comment it is replying to. A top-level comment should -have `parentId` set to `NULL`. +Add a `votes` table to your database. The way our `votes` table will be setup is often referred to as a "join table". Its goal is to allow a many-to-many relation between the `posts` and `users` tables. In this case, a single user can vote on many posts, and a single post can be voted on by many users. + +When creating the `votes` table, the primary key will be set to the pair `(postId, userId)`. This will ensure that a single user can only vote once on the same post. It will do so by disallowing queries that would introduce a pair that already exists in the database. It's common for a join table to not have its own automatically incrementing, unique ID. To accomplish this you can simply write `PRIMARY KEY (userId, postId)` in your `CREATE TABLE`. Finally, each of these two ID columns will need a foreign key referencing their respective tables. -Once you figure out the correct `CREATE TABLE` statement, add it to `reddit.sql` with a comment. +In addition to the two ID columns, the `votes` table will need a `voteDirection` column which can be set to `TINYINT`. It will take the value of `1` to signify an upvote, and a value of `-1` to signify a downvote. This way, when we `GROUP BY postId`, we can do a `SUM` over the `voteDirection` column and easily get the `voteScore` for each post we are interested in. We will also add `createdAt` and `updatedAt` columns to this table. + +Since the create table statement is a bit different from the others, we will provide it to you. Note that this type of table is very common for many-to-many relations. + +```sql +CREATE TABLE votes ( + userId INT, + postId INT, + voteDirection TINYINT, + createdAt DATETIME NOT NULL, + updatedAt DATETIME NOT NULL, + PRIMARY KEY (userId, postId), -- this is called a composite key because it spans multiple columns. the combination userId/postId must be unique and uniquely identifies each row of this table. + KEY userId (userId), -- this is required for the foreign key + KEY postId (postId), -- this is required for the foreign key + FOREIGN KEY (userId) REFERENCES users (id) ON DELETE CASCADE, -- CASCADE means also delete the votes when a user is deleted + FOREIGN KEY (postId) REFERENCES posts (id) ON DELETE CASCADE, -- CASCADE means also delete the votes when a post is deleted +); + +``` #### Step 2: -In the `reddit.js` API, add a `createComment(comment, callback)` function. It should take a comment object which -contains a `text`, `userId`, `postId` and optional `parentId`. It should insert the new comment, and either -return an error or the newly created comment. If `parentId` is not defined, it should be set to `NULL`. You can -take some inspiration from the `createPost` function which operates in a similar way :) +Add a function called `createVote(vote)` to your Reddit API. This function will take a `vote` object with `postId`, `userId`, `voteDirection`. +Your function should make sure that the `voteDirection` is either `1`, `0` (to cancel a vote) or `-1`. If it's different, your function should return an error. + +If we do the query with a regular `INSERT` we can run into errors. The first time a user votes on a given post will pass. But if they try to change their vote direction, the query will fail because we would be trying to insert a new vote with the same `postId`/`userId`. One way to fix this would be to first check if the user has already voted on a post by doing a `SELECT`, and `UPDATE` the corresponding row if they have. But SQL gives us a better way to do that: [`ON DUPLICATE KEY UPDATE`](https://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html). With it, we can write our voting query like this: + +```sql +INSERT INTO votes SET postId=?, userId=?, voteDirection=? ON DUPLICATE KEY UPDATE voteDirection=?; +``` +This way, the first time user#1 votes for post#1, a new row will be created. If they change their mind and send a different vote for the same post, then the `voteDirection` column of the same row will be updated instead. + +#### Step 3: +Go back to your `getAllPosts` function. We will need to change two things in there: + +1. Now that we have voting, we need to add the `voteScore` of each post by doing an extra `JOIN` to the `votes` table, grouping by `postId`, and doing a `SUM` on the `voteDirection` column. +2. To make the output more interesting, we need to `ORDER` the posts by the highest `voteScore` first instead of creation time. + +## Filling the database with "real data" +At this point you should have a database with `users`, `posts`, `subreddits`, and `votes` tables. You also have a set of JavaScript functions written for NodeJS that can communicate with your database server and query it to insert new data as well as retrieve existing data. + +However, your database is empty. In this section, we will take advantage of reddit.com's JSON API to fetch some real Reddit data and populate our database. This will make us practice asynchronous programming as well as communication with web APIs. + +First, go to your browser and open two tabs. In one tab, navigate to `https://www.reddit.com/`. In the other tab navigate to `https://www.reddit.com/.json`. What you see in the `.json` tab is the data equivalent of the regular Reddit home page. In there you have an array of post objects. Each object contains data about a certain post on Reddit like the user who posted, the subreddit, the time of creation, etc. + +Reddit has such a data-oriented version for pretty much each page of its website. For example if you load `https://www.reddit.com/r/montreal.json`, you will see the data for the /r/Montreal homepage. + +For this part of the project, we're going to write a small crawler for the Reddit.com JSON API. Our crawler will "discover" a list of the 25 most popular subreddits. For each subreddit, we will make a request to get the posts on its front page. This will give us a database with ~2000 posts in 50 subreddits. The posts should mostly be by different users. We will be skipping self posts since they don't fit with our model. + +### Step 1: Read the code in `reddit-crawl.js` +To fill our database with some real Reddit data, we will be writing and running a script called `reddit-crawl.js`. A starter file has been provided for you. It contains one function `getSubreddits` that makes a request to the Reddit JSON API and retrieves a list of subreddits. Since the data returned from the Reddit API is quite thick, the function does a `map` on the parsed data to return only a list of subreddit name strings. + +### Step 2: Complete the function `getPostsForSubreddit` +This function should return a list of the top 50 posts for the `subredditName` passed to it as an argument. If the `subredditName` is `montreal`, then you should make a request to `https://www.reddit.com/r/montreal.json?limit=50`. + +When receiving the data, you will get an array of 50 posts under `data.children`. Just like for `getSubreddits`, start by parsing the result as JSON. + +Then, similarly to `getSubreddits`, we will also be calling array methods to do some transformations on the data: + +1. Using the `filter` method, get rid of all posts for which the `is_self` property is `true`. +2. Using the `map` method on the result, return objects with only these properties: + * `title`: take it from the `title` property of the data + * `url`: take it from the `url` property of the data + * `user`: take it from the `author` property of the data + +### Step 3: Do the crawling! +Armed with these two functions and the functions from `reddit.js`, we should be able to do what we need. + +Notice that the `crawl` function has been provided to you! It uses the two funtions you just completed. All you have to do is execute `crawl`. Then, check your database to see if all the data got imported. This will test some of your `RedditAPI` functions at the same time. + +--- + +## CHALLENGE: Adding a post comments functionality and filling it with "real data" +This feature will be complicated to implement because it will require not only adding new code, but also modifying existing code and databases. For this reason the steps are outlined in detail. By the end, we should have a function that allows us to add a comment to a post, both as a top-level comment or as a reply to another comment. We will also have a function that lists all the comments for a post, nested by reply level. + +#### Step 1: +The first step will be to create a `comments` table. Each comment should have a unique, auto incrementing `id` and a `text` anywhere from 1 to 10000 characters. It should also have `createdAt` and `updatedAt` `DATETIME` columns. Each comment should also have a `userId` linking it to the user who created the comment (using a foreign key), a `postId` linking it to the post which is being commented on, and a `parentId` linking it to the comment it is replying to. A top-level comment should have `parentId` set to `NULL`. + +Since this `CREATE TABLE` is a bit complicated, we are providing a solution for you. Try to find the correct solution before looking at this one. + +```sql +CREATE TABLE comments ( + id INT AUTO_INCREMENT PRIMARY KEY, + userId INT, + postId INT, + parentId INT, + text VARCHAR(10000), + FOREIGN KEY (userId) REFERENCES users (id) ON DELETE SET NULL, + FOREIGN KEY (userId) REFERENCES posts (id) ON DELETE SET NULL, + FOREIGN KEY (parentId) REFERENCES comments (id) +); +``` + +#### Step 2: +In the `reddit.js` API, add a `createComment(comment)` function. It should take a comment object which contains a `text`, `userId`, `postId` and optional `parentId`. It should insert the new comment, and either return an error or the ID of the new comment. If `parentId` is not defined, it should be set to `NULL`. You can take some inspiration from the `createPost` function which operates in a similar way. #### Step 4: -In the `reddit.js` API, add a `getCommentsForPost(postId, callback)` function. It should return a **thread** +In the `reddit.js` API, add a `getCommentsForPost(postId, levels)` function. It should return a **thread** of comments in an array. The array should contain the top-level comments, and each comment can optionally have -a `replies` array. This array will contain the comments that are replies to the current one. Since you will be -using one `LEFT JOIN` per level of comment, we will limit this exercise to retrieving 3 levels of comments. -The comments should be sorted by their `createdAt` date at each level. +a `replies` array. This array will contain the comments that are replies to the current one. + +We can do this by using a recursive function. The steps will be like this: + +1. Retrieve the top-level comments for a given post ID. These are the ones where parent_id is NULL. +2. Gather up all the unique IDs of the top-level comments, and retrieve those comments where the parentId is equal to one of those IDs +3. Keep going until either you have no more comments at a certain level, or you have reached `level` levels of replies. -**NOTE**: The way this exercise is done, the comments will be returned without the associated usernames. We will -be getting only the `userId` instead. The next exercise asks you to add the username to the data. +The implementation is left up to you. In the end you will be doing one query per level of replies instead of one alltogether, but they will all be done by the same recursive function. The final output should look something like this: @@ -248,83 +299,5 @@ The final output should look something like this: ] ``` -#### Step 6 -In the `reddit.js` API, modify the `getSinglePost` function to return the full thread of comments in -addition to the post data itself. This will require re-using your `JOIN` logic from `getCommentsForPost`. - - -### Add usernames to the comments functionality -Return to the comments functionality and add the username for each comment. Since we are only requiring the -username and not the full user object, you don't need to nest a user object with each comment. A `username` -property will be sufficient. For example: - -```json -{ - "id": 486, - "text": "You mean George Orwell?", - "createdAt": "...", - "updatedAt": "...", - "username": "PM_ME_YOUR_BOOKIES" -} -``` - -### Add the voting system for posts only -Reddit wouldn't be what it is without its voting system. The mix of up/down votes and good scoring -functions makes it possible to view the world of Reddit from all kinds of points of view. - -To make the rest of the instructions clearer, let's define some terms that are proper to us and -that describe the vote parameters and scoring functions. Note that the scoring functions are made -for simplicity and not accuracy. They certainly wouldn't give rise to the same dynamism that is -seen on reddit. - -* **`numUpvotes`**: The number of upvotes for a given post -* **`numDownvotes`**: The number of downvotes for a given post -* **`totalVotes`**: `= numUpvotes - numDownvotes` -* **`voteScore`**: `= numUpvotes - numDownvotes` -* **Top ranking**: `= voteScore` -* **Hotness ranking**: `= voteScore / (amount of time the post has been online)` -* **Newest ranking**: `= createdAt` -* **Controversial ranking**: ```= numUpvotes < numDownvotes ? totalVotes * (numUpvotes / numDownvotes) : totalVotes * (numDownvotes / numUpvotes)``` we can filter out posts that have few votes (< 100) since they may not be meaningful. - -#### Step 1: -Add a `votes` table to your database. The way our `votes` table will be setup is often referred to as a "join table". Its goal -is to allow so-called many-to-many relations. In this case, a single user can vote on many posts, and a single post can -be voted on by many users. For this reason, we can't simply have a `voterId` in the `posts` table. Neither can we have a `postVotedOn` -or something like that in the `users` table. - -When creating the `votes` table, the primary key -- a unique key -- will be set to the pair `(postId, userId)`. This will ensure that -a single user can only vote once on the same post. It will do so by disallowing queries that would introduce a pair that already exists -in the database. It's common for a join table to not have its own automatically incrementing, unique ID. The link between the two tables -is unique enough, and makes more sense. To do this you can simply write `PRIMARY KEY (userId, postId)` in your `CREATE TABLE`. Finally, -each of these two ID columns will need a foreign key referencing their respective tables. - -In addition to the two ID columns, the `votes` table will need a `vote` column which can be set to `TINYINT`. It will take the value of -`1` to signify an upvote, and a value of `-1` to signify a downvote. This way, when we `GROUP BY postId`, we can do a `SUM` over the `vote` -column and easily get the `voteScore` for each post we are interested in. We can also add `createdAt` and `updatedAt` columns to this table. - -#### Step 2: -Add a function called `createVote(vote, callback)` to your Reddit API. This function will take a `vote` object with `postId`, `userId`, `vote`. -It should make sure that the `vote` is either `1`, `0` (to cancel a vote) or `-1`. Otherwise it should reject the request. - -If we query with a regular `INSERT` we can run into errors. The first time a user votes on a given post will pass. But -if they try to change their vote direction, the query will fail because of a duplicate key. While we could check for this and do an `UPDATE` -query instead, MySQL has a better way: the "[`ON DUPLICATE KEY UPDATE`](https://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html)". With -it, we can write our voting query like this: -```sql -INSERT INTO `votes` SET `postId`=1, `userId`=1, `vote`=1 ON DUPLICATE KEY UPDATE `vote`=1; -``` -This way, the first time user#1 votes for post#1, a new row will be created. If they change their minds or try to trick the system, then the `vote` -column of the same row will be updated instead. - -Before you move on to the next step, it would be nice to rename your function from `createVote` to `createOrUpdateVote` to reflect more closely -what it is doing. - -#### Step 3: -Go back to your `getAllPosts` function. Add a `sortingMethod` option to the function, which will default to `new` -- the sorting we are currently using. -Then, one at a time, start implementing the different sorting methods mentioned above. The easiest one is `top` because it's simply ranking by the -`voteScore` in descending order. - -As a first step, add a `voteScore` property to each post that you retrieve. Do this by `JOIN`ing the `posts` table with the `votes` table, and grouping by `postId`. -Add a `SUM` on the `vote` column of the `votes` table, and give it an alias of `voteScore`. - -Then start implementing each sorting method as you see fit, changing the `ORDER BY` clause of your query. +#### Step 5: +Taking inspiration from the crawler in `reddit-crawl.js`, fill up your database with random post comments coming from Reddit's own JSON API. \ No newline at end of file diff --git a/demo-using-api.js b/demo-using-api.js new file mode 100644 index 0000000..76d2cef --- /dev/null +++ b/demo-using-api.js @@ -0,0 +1,70 @@ + +// load the mysql library +var mysql = require('promise-mysql'); + +// create a connection to our Cloud9 server +var connection = mysql.createPool({ + host : 'localhost', + user : 'root', // CHANGE THIS :) + password : '', + database: 'reddit', + connectionLimit: 10 +}); + +// load our API and pass it the connection +var RedditAPI = require('./reddit'); + +var myReddit = new RedditAPI(connection); + +// We call this function to create a new user to test our API +// The function will return the newly created user's ID in the callback +// myReddit.createUser({ +// username: 'PM_ME_CUTES', +// password: 'abc123' +// }) +// .then(newUserId => { +// // Now that we have a user ID, we can use it to create a new post +// // Each post should be associated with a user ID +// console.log('New user created! ID=' + newUserId); + + // /*return*/ myReddit.createPost({ + // title: 'A grate joke!', + // url: 'http://www.digg.com', + // userId: 1, + // subredditId: 2 + // }) + + // .then(newPostId => { + // // If we reach that part of the code, then we have a new post. We can print the ID + // console.log('New post created! ID=' + newPostId); + // }) + // .catch(error => { + // console.log(error.stack); + // }); + + + myReddit.getAllPosts() + .then(response => { + // console.log(response) + } + ) + .then(response => { + connection.end(); + }); + + // myReddit.createSubreddit({name: 'funny', description: 'grate jokes'}) + // .then(response => { + // console.log(response); + // }) + // .then(response => { + // connection.end(); + // }); + // myReddit.getAllSubreddits() + // .then(response => { + // console.log(response); + // }) + // .then (response => { + // connection.end(); + // }); + + \ No newline at end of file diff --git a/index.js b/index.js deleted file mode 100644 index d6b7285..0000000 --- a/index.js +++ /dev/null @@ -1,38 +0,0 @@ -// load the mysql library -var mysql = require('mysql'); - -// create a connection to our Cloud9 server -var connection = mysql.createConnection({ - host : 'localhost', - user : 'ziad_saab', // CHANGE THIS :) - password : '', - database: 'reddit' -}); - -// load our API and pass it the connection -var reddit = require('./reddit'); -var redditAPI = reddit(connection); - -// It's request time! -redditAPI.createUser({ - username: 'hello23', - password: 'xxx' -}, function(err, user) { - if (err) { - console.log(err); - } - else { - redditAPI.createPost({ - title: 'hi reddit!', - url: 'https://www.reddit.com', - userId: user.id - }, function(err, post) { - if (err) { - console.log(err); - } - else { - console.log(post); - } - }); - } -}); \ No newline at end of file diff --git a/package.json b/package.json index 35da6ac..4d38865 100644 --- a/package.json +++ b/package.json @@ -11,6 +11,7 @@ "license": "ISC", "dependencies": { "bcrypt": "^0.8.6", - "mysql": "^2.10.2" + "bcrypt-as-promised": "^1.1.0", + "promise-mysql": "^3.0.1" } } diff --git a/reddit-crawl.js b/reddit-crawl.js new file mode 100644 index 0000000..7a33e21 --- /dev/null +++ b/reddit-crawl.js @@ -0,0 +1,100 @@ +var request = require('request-promise'); +var mysql = require('promise-mysql'); +var RedditAPI = require('./reddit'); + +function getSubreddits() { + return request(/* fill in the URL, it's always the same */) + .then(response => { + // Parse response as JSON and store in variable called result + var response; // continue this line + + // Use .map to return a list of subreddit names (strings) only + return response.data.children.map(/* write a function */) + }); +} + +function getPostsForSubreddit(subredditName) { + return request(/* fill in the URL, it will be based on subredditName */) + .then( + response => { + // Parse the response as JSON and store in variable called result + var response; // continue this line + + + return response.data.children + .filter(/* write a function */) // Use .filter to remove self-posts + .map(/* write a function */) // Use .map to return title/url/user objects only + + } + ); +} + +function crawl() { + // create a connection to the DB + var connection = mysql.createPool({ + host : 'localhost', + user : 'root', + password : '', + database: 'reddit', + connectionLimit: 10 + }); + + // create a RedditAPI object. we will use it to insert new data + var myReddit = new RedditAPI(connection); + + // This object will be used as a dictionary from usernames to user IDs + var users = {}; + + /* + Crawling will go as follows: + + 1. Get a list of popular subreddits + 2. Loop thru each subreddit and: + a. Use the `createSubreddit` function to create it in your database + b. When the creation succeeds, you will get the new subreddit's ID + c. Call getPostsForSubreddit with the subreddit's name + d. Loop thru each post and: + i. Create the user associated with the post if it doesn't exist + 2. Create the post using the subreddit Id, userId, title and url + */ + + // Get a list of subreddits + getSubreddits() + .then(subredditNames => { + subredditNames.forEach(subredditName => { + var subId; + myReddit.createSubreddit({name: subredditName}) + .then(subredditId => { + subId = subredditId; + return getPostsForSubreddit(subredditName) + }) + .then(posts => { + posts.forEach(post => { + var userIdPromise; + if (users[post.user]) { + userIdPromise = Promise.resolve(users[post.user]); + } + else { + userIdPromise = myReddit.createUser({ + username: post.user, + password: 'abc123' + }) + .catch(function(err) { + return users[post.user]; + }) + } + + userIdPromise.then(userId => { + users[post.user] = userId; + return myReddit.createPost({ + subredditId: subId, + userId: userId, + title: post.title, + url: post.url + }); + }); + }); + }); + }); + }); +} \ No newline at end of file diff --git a/reddit.js b/reddit.js index 7dbed8e..a15231c 100644 --- a/reddit.js +++ b/reddit.js @@ -1,114 +1,150 @@ -var bcrypt = require('bcrypt'); +var bcrypt = require('bcrypt-as-promised'); var HASH_ROUNDS = 10; -module.exports = function RedditAPI(conn) { - return { - createUser: function(user, callback) { - - // first we have to hash the password... - bcrypt.hash(user.password, HASH_ROUNDS, function(err, hashedPassword) { - if (err) { - callback(err); - } - else { - conn.query( - 'INSERT INTO `users` (`username`,`password`, `createdAt`) VALUES (?, ?, ?)', [user.username, hashedPassword, null], - function(err, result) { - if (err) { - /* - There can be many reasons why a MySQL query could fail. While many of - them are unknown, there's a particular error about unique usernames - which we can be more explicit about! - */ - if (err.code === 'ER_DUP_ENTRY') { - callback(new Error('A user with this username already exists')); +class RedditAPI { + constructor(conn) { + this.conn = conn; + } + + createUser(user) { + /* + first we have to hash the password. we will learn about hashing next week. + the goal of hashing is to store a digested version of the password from which + it is infeasible to recover the original password, but which can still be used + to assess with great confidence whether a provided password is the correct one or not + */ + return bcrypt.hash(user.password, HASH_ROUNDS) + .then(hashedPassword => { + return this.conn.query('INSERT INTO users (username,password, createdAt, updatedAt) VALUES (?, ?, NOW(), NOW())', [user.username, hashedPassword]); + }) + .then(result => { + return result.insertId; + }) + .catch(error => { + // Special error handling for duplicate entry + if (error.code === 'ER_DUP_ENTRY') { + throw new Error('A user with this username already exists'); } else { - callback(err); + throw error; } - } - else { - /* - Here we are INSERTing data, so the only useful thing we get back - is the ID of the newly inserted row. Let's use it to find the user - and return it - */ - conn.query( - 'SELECT `id`, `username`, `createdAt`, `updatedAt` FROM `users` WHERE `id` = ?', [result.insertId], - function(err, result) { - if (err) { - callback(err); - } - else { - /* - Finally! Here's what we did so far: - 1. Hash the user's password - 2. Insert the user in the DB - 3a. If the insert fails, report the error to the caller - 3b. If the insert succeeds, re-fetch the user from the DB - 4. If the re-fetch succeeds, return the object to the caller - */ - callback(null, result[0]); - } - } - ); - } + }); + } + + createSubreddit(subreddit){ + return this.conn.query(` + INSERT INTO subreddits (name, description, createdAt, updatedAt) + VALUES (?, ?, NOW(), NOW())`, [subreddit.name, subreddit.description] + ) + .then(result => { + return result.insertId; + }) + .catch(error => { + if (error.code === 'ER_DUP_ENTRY'){ + throw new Error('A subreddit with this name already exists'); } - ); - } - }); - }, - createPost: function(post, callback) { - conn.query( - 'INSERT INTO `posts` (`userId`, `title`, `url`, `createdAt`) VALUES (?, ?, ?, ?)', [post.userId, post.title, post.url, null], - function(err, result) { - if (err) { - callback(err); - } - else { - /* - Post inserted successfully. Let's use the result.insertId to retrieve - the post and send it to the caller! - */ - conn.query( - 'SELECT `id`,`title`,`url`,`userId`, `createdAt`, `updatedAt` FROM `posts` WHERE `id` = ?', [result.insertId], - function(err, result) { - if (err) { - callback(err); + else{ + throw error; + } + }); + } + + createPost(post) { + return this.conn.query( + ` + INSERT INTO posts (userId, title, url, createdAt, updatedAt, subredditId) + VALUES (?, ?, ?, NOW(), NOW(), ?)`, + [post.userId, post.title, post.url, post.subredditId] + ) + .then(result => { + return result.insertId; + }) + .catch(error => { + if (!post.subredditId){ + throw new Error('Could not find subreddit Id'); } - else { - callback(null, result[0]); + else{ + throw error; } - } + }); + } + + getAllPosts() { + /* + strings delimited with ` are an ES2015 feature called "template strings". + they are more powerful than what we are using them for here. one feature of + template strings is that you can write them on multiple lines. if you try to + skip a line in a single- or double-quoted string, you would get a syntax error. + + therefore template strings make it very easy to write SQL queries that span multiple + lines without having to manually split the string line by line. + */ + return this.conn.query( + ` + SELECT posts.id AS thePostId, + title, + url, + userId, + posts.createdAt AS postCreate , + posts.updatedAt AS postUpdate, + username, + users.createdAt AS userCreate, + users.updatedAt AS userUpdate, + subreddits.id AS theSubredditId, + name, + description, + subreddits.createdAt AS subredditCreate, + subreddits.updatedAt AS subredditUpdate + FROM posts JOIN users + ON users.id = posts.userId + JOIN subreddits + ON subreddits.id = posts.subredditId + ORDER BY posts.createdAt DESC + LIMIT 25` + + ) + .then (response => { + var mapped = response.map(function(value, index){ + var newArr = { + id: value.thePostId, + title: value.title, + url: value.url, + user: { + id: value.userId, + username: value.username, + createdAt: value.userCreate, + updatedAt: value.userUpdate + }, + subreddit:{ + id: value.theSubredditId, + name: value.name, + description: value.description, + createdAt: value.subredditCreate, + updatedAt: value.subredditUpdate + + }, + createdAt: value.postCreate, + updatedAt: value.postUpdate + + }; + return newArr; + }); + + console.log(mapped); + } + + ); + } + getAllSubreddits(){ + return this.conn.query( + ` + SELECT id, name, description, createdAt, updatedAt + FROM subreddits + ORDER BY createdAt DESC + ` ); - } - } - ); - }, - getAllPosts: function(options, callback) { - // In case we are called without an options parameter, shift all the parameters manually - if (!callback) { - callback = options; - options = {}; - } - var limit = options.numPerPage || 25; // if options.numPerPage is "falsy" then use 25 - var offset = (options.page || 0) * limit; - - conn.query(` - SELECT \`id\`,\`title\`,\`url\`,\`userId\`, \`createdAt\`, \`updatedAt\` - FROM \`posts\` - ORDER BY \`createdAt\` DESC - LIMIT ? OFFSET ? - `, [limit, offset], - function(err, results) { - if (err) { - callback(err); - } - else { - callback(null, results); - } - } - ); } - } + } + +module.exports = RedditAPI; \ No newline at end of file diff --git a/reddit.sql b/reddit.sql index 9561e77..5dfc90a 100644 --- a/reddit.sql +++ b/reddit.sql @@ -1,25 +1,41 @@ -- This creates the users table. The username field is constrained to unique -- values only, by using a UNIQUE KEY on that column -CREATE TABLE `users` ( - `id` INT(11) NOT NULL AUTO_INCREMENT, - `username` VARCHAR(50) NOT NULL, - `password` VARCHAR(60) NOT NULL, -- why 60??? ask me :) - `createdAt` TIMESTAMP NOT NULL DEFAULT 0, - `updatedAt` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, - PRIMARY KEY (`id`), - UNIQUE KEY `username` (`username`) -) ENGINE=InnoDB DEFAULT CHARSET=utf8; +CREATE TABLE users ( + id INT AUTO_INCREMENT PRIMARY KEY, + username VARCHAR(50) NOT NULL, + password VARCHAR(60) NOT NULL, -- why 60??? ask me :) + createdAt DATETIME NOT NULL, + updatedAt DATETIME NOT NULL, + UNIQUE KEY username (username) +); -- This creates the posts table. The userId column references the id column of -- users. If a user is deleted, the corresponding posts' userIds will be set NULL. -CREATE TABLE `posts` ( - `id` int(11) NOT NULL AUTO_INCREMENT, - `title` varchar(300) DEFAULT NULL, - `url` varchar(2000) DEFAULT NULL, - `userId` int(11) DEFAULT NULL, - `createdAt` TIMESTAMP NOT NULL DEFAULT 0, - `updatedAt` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, - PRIMARY KEY (`id`), - KEY `userId` (`userId`), - CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`userId`) REFERENCES `users` (`id`) ON DELETE SET NULL -) ENGINE=InnoDB DEFAULT CHARSET=utf8; \ No newline at end of file +CREATE TABLE posts ( + id INT AUTO_INCREMENT PRIMARY KEY, + title VARCHAR(300) DEFAULT NULL, + url VARCHAR(2000) DEFAULT NULL, + userId INT DEFAULT NULL, + createdAt DATETIME NOT NULL, + updatedAt DATETIME NOT NULL, + KEY userId (userId), -- why did we add this here? ask me :) + CONSTRAINT validUser FOREIGN KEY (userId) REFERENCES users (id) ON DELETE SET NULL +); + +-- This creates the subreddits table. +CREATE TABLE subreddits ( + id INT AUTO_INCREMENT PRIMARY KEY, + name VARCHAR(30) UNIQUE NOT NULL, + description VARCHAR(200) DEFAULT NULL, + createdAt DATETIME NOT NULL, + updatedAt DATETIME NOT NULL +); + +-- This adds a subredditId to the posts table. +ALTER TABLE posts +ADD subredditId INT + +-- This adds a Foreign Key constraining subredditId's values to valid id's in subreddits. +ALTER TABLE posts +ADD FOREIGN KEY (subredditId) +REFERENCES subreddits (id); \ No newline at end of file