Skip to content

pb1999/Portfolio_Optimization

Repository files navigation

Portfolio Optimization

Introduction

In this project, I will be using Deep Learning to predict future stock market returns and use those predictions to construct a portfolio with the top performing stocks that automatically rebalances on a weekly basis.

Data

All the historical stock market data will be collected using the yahoo finance API (yfinance). However, since there is no market screener API to automatically get key market analytics, I download a CSV file from the Nasdaq website containing the largest publicly traded companies and from those, I select the largest 30 to use in the model. Moreover, since I am interested in asset performance, I will convert prices to weekly returns. A weekly interval is preferred to a daily interval, as it better captures the general trend of the market and can be more reliable when making predictions.

# Read the CSV file and keep only the largest 30 companies
df = pd.read_csv('nasdaq_screener_1613878479181.csv')
stocks = df[~(df['IPO Year']>2020)].dropna().sort_values(by='Market Cap')['Symbol'].iloc[-30:].tolist()
stocks.sort()

# Each element of the list is a dataframe with the Adjusted close price of each ticker.
securities_by_date = []
for i in stocks:
    df = yf.download(i,start= '2008-01-01',end=None , interval = '1wk')[['Adj Close']].dropna().pct_change(periods = 5).dropna().rename(columns={'Adj Close':i})
    securities_by_date.append(df)

Model

It is very important to scale features before training a neural network, so we must normalize the data sets. To achieve that, I split the data into training 70% and testing 30% and subtract from both the mean and divide by the standard deviation. The mean and standard deviation should only be from the training set so that the model has no access to the values in the test set.

class Momentum_Model:
    
    def __init__(self,df, window, future):
        
        self.df, self.window, self.future, = df, window, future
        
        # Split the data to train and test
        train_df = self.df[0:int(len(self.df)*0.7)]
        test_df = self.df[int(len(self.df)*0.7):]
        # Get means and std for the columns of the dataframe
        train_mean = train_df.mean()
        train_std = train_df.std()
        # Normalize the data by subtracting the mean and std of the train set
        train_df = (train_df - train_mean) / train_std
        test_df = (test_df - train_mean) / train_std
        # Convert the normalized data to numpy array.
        train_df, test_df = train_df.to_numpy(), test_df.to_numpy()
        
        self.train_df, self.test_df, self.train_mean, self.train_std = train_df, test_df, train_mean, train_std 

After normalizing the data, the next step is to reshape them in order to fit our model. The LSTM input layer has three dimensions (samples, timesteps, features). There is only one feature (weekly stock return) and the number of samples is the length of the data set. Moreover, for all samples in the model we use the 5 previous observations in order to predict the 6th, which means that the X_train , X_test sets will be five-dimentional np.array with input shapes (len(X_train),5,1) , (len(X_test),5,1) and y_train , y_test will have input shapes (len(y_train),1) , (len(y_test),1) .

def layout(self):
    # Train Layout
    X_train = []
    y_train = []
    for i in range(self.window,len(self.train_df) - self.future):
        X_train.append(self.train_df[i - self.window:i])  
        y_train.append(self.train_df[i + self.future,0:1])  
        
    X_train, y_train = np.array(X_train),np.array(y_train)
    X_train = X_train.reshape(X_train.shape[0], X_train.shape[1],self.train_df.shape[1])
        
    # Test Layout
    X_test = []
    y_test = []
    for i in range(self.window,len(self.test_df)-self.future):
        X_test.append(self.test_df[i - self.window:i])
        y_test.append(self.test_df[i + self.future,0:1])
            
    X_test, y_test = np.array(X_test),np.array(y_test)
    X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], self.test_df.shape[1])
        
    return X_train, y_train, X_test, y_test
        
        

Model Architecture:

I will be using a stacked LSTM model comprised of one input layer, three LSTM layers and an output layer (which will be a dense layer with only one output). The first LSTM layer will have 50 units and the second and third will have 30 units. For both the first and the second layers return_sequences=True because we want the internal state of the previous layer to pass onto the next layers. Also, I will be using a 20% dropout rate so that the model focuses on more recent data.

def Rnn_model(self, X):
        
        self.X = X
        
        model = Sequential()
        model.add(LSTM(units=50,activation = 'tanh' , return_sequences=True, input_shape=(self.X.shape[1], self.X.shape[2])))
        model.add(Dropout(0.2))
        model.add(LSTM(units = 30,activation ='sigmoid', return_sequences = True))
        model.add(Dropout(0.2))
        model.add(LSTM(units = 30,activation ='sigmoid', return_sequences = False))
        model.add(Dense(1))
        
        return model
        

Moreover, I will be using the Adam algorithm as an optimizer with a learning rate of .001 and a decay of 1e-6. Once training is complete and we have predicted the values of the test set, we must rescale the predictions by multiplying the standard deviation and adding the mean of the train set. We do this because we want to establish a trading strategy that picks stocks based on higher performance; so, if we do not rescale the features, we will be unable to distinguish which stocks were performing better since each stock has different means and standard deviations.

def predictions(self, model, epochs, X_train, y_train, X_test, y_test):
    self.model, self.epochs, self.X_train, self.y_train, self.X_test, self.y_test  = model, epochs, X_train, y_train, X_test, y_test
    opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
    self.model.compile(optimizer = opt, loss = 'mean_squared_error')
    self.model.fit(self.X_train, self.y_train, epochs = self.epochs, batch_size = 30, verbose = 0)
        
    predictions = self.model.predict(self.X_test)*self.train_std.tolist()[0] + self.train_mean.tolist()[0]
        
    y_test_inverse_transform = self.y_test*self.train_std.tolist()[0] + self.train_mean.tolist()[0]
        
    return [predictions.flatten(), y_test_inverse_transform.flatten()]
    
    

Now that we have everything, we create a loop that runs the model and saves the predictions of future stock returns for all stocks.

stock_predictions = []
for df in securities_by_date:  
    
    data = Momentum_Model(df,5,0)
    X_train, y_train, X_test, y_test = data.layout()
    volume_model = data.Rnn_model(X_train)
    predictions = data.predictions(volume_model, 100, X_train, y_train, X_test,y_test)
    stock_predictions.append(predictions)
    

For reference, below you can see what the model predicts for Apple.

alt text

Trading Strategy and Portfolio Rebalancing

We already saved the predictions of the model for each stock. The next step is to combine all the predictions in one dataset and keep only the top 10. After that, at each timestep (week), we implement a simple trading algorithm of buying the open market price of the top 10 stocks today and selling the open market price one week from today. Therefore, we create a portfolio comprised of 10 stocks (each stock has equal weight) that rebalances on a weekly basis. Finally, we assume zero transaction costs when buying or selling securities.

prediction_data = [stock_predictions[i][0].tolist() for i in range(len(stock_predictions))]

list_of_predictions = []
dates = pd.date_range(start='1/1/2008', end=datetime.today().strftime('%Y-%m-%d')).tolist()[::7]
for i in range(len(prediction_data)):
    
    df = pd.DataFrame(prediction_data[i]).rename(columns={0:stocks[i]})
    df['Date'] = dates[-len(df.index):]
    list_of_predictions.append(df.set_index('Date'))
    
df_predictions = reduce(lambda left,right: pd.merge(left,right,on='Date', how = 'outer'), list_of_predictions).fillna(-1000)

column_names = ['10th','9th', '8th', '7th', '6th', '5th', '4th', "3rd", "2nd", "1st"]
    
performance = pd.DataFrame(columns = column_names)
for i in range(len(df_predictions)):
    performance.loc[len(performance)] = df_predictions.iloc[i].T.sort_values().iloc[-10:].reset_index()['index'].tolist()
    
performance['Date'] = dates[-len(performance.index):]
10th 9th 8th 7th 6th 5th 4th 3rd 2nd 1st Date
SHOP ABBV QCOM TSM JD AMGN SE PDD ASML TSLA 2021-01-19
MSFT GOOG AMGN QCOM TSM HDB ASML JD TSLA SE 2021-01-26
TSM JD HDB ASML MSFT NVDA TSLA GOOG BABA SE 2021-02-02
CRM ASML SHOP NFLX NVDA TSLA MSFT GOOG BABA SE 2021-02-09
HDB CRM MSFT BABA ZM NVDA GOOG SQ SHOP SE 2021-02-16
portfolio_performance = []
portfolio_weights = np.array([1/10,1/10,1/10,1/10,1/10,1/10,1/10,1/10,1/10,1/10])

for i in range(len(performance.index) - 2):
    weekly_securities = yf.download(performance.iloc[i][0:10].tolist(),start= performance.iloc[i][10] ,end= performance.iloc[i+1][10], interval = '1wk')['Open']
    
    weekly_securities_2 = weekly_securities.reset_index().dropna().drop_duplicates(subset=['Date']).set_index('Date').pct_change().dropna().values.flatten()
    portfolio_performance.append(weekly_securities_2)
 
 final_performance = []
for j in portfolio_performance:
        
    try:
        final = j@portfolio_weights
        
    except TypeError:
        final = math.nan
        
    except ValueError:
        final = math.nan
        
    final_performance.append(final)
    

Finally, we compute the performance of the portfolio that uses this simple trading algorithm and compare it to the performance of SPY.

alt text

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors