BreakOut Strategy - Backtesting in Python (PART-2)

In the last post, we prepared our data, installed python, pycharm as IDE, installed pandas and numpy modules. Now we are ready to work in python for analysis.

Reading Data from CSV in Dataframe

The pandas module have a built-in function for reading data from a CSV file. The function is read_csv and since it is part of the pandas module, it has to be accessed through a dot operator. 

01: import pandas as pd
02: df = pd.read_csv("myDataFile.csv")
03: print(df)

Above is the code for reading a CSV file in dataframe and printing the dataframe using print function. The line numbers 01,02,03 are not part of the code and have been provided for assisting in explanation of code to newbies.

Line number 01 imports all the functions stored in the "pandas" module "as pd". You can use import "pandas as myShortNameForPandasModule". Now each time you want to access any function stored inside the "pandas" or you may say provided by pandas, I can refer to them as "myShortNameForPandasModule". One of the functions that can read data stored in a CSV file and save it as spreadsheet format in python variable is "read_csv". Now, since "read_csv" is a function provided by pandas, I can refer it as "myShortNameForPandasModule.read_csv". 

In the above example, at line number one it has referred pandas as pd, it was done intentionally so that I need not have to key in full name pandas but can work with pd.

In line number 02, the CSV file was read by pandas function read_csv and the data of the entire file get stored in df, a variable. This variable is referred to as a dataframe. This dataframe stores entire data in rows and columns inside python as if it is a spreadsheet. The structure of dataframe, can be logically visualized as data stored in rows and columns. 

The last line 03 is to print the data stored in df variable. The output generated from print instruction is shown below.

        Datetime  Open  High   Low  Close  Volume
0   2021-10-01T09:15:00 448.05 452.10 444.05 451.35 2323271
1   2021-10-01T09:30:00 451.15 451.95 449.80 450.30  688008
2   2021-10-01T09:45:00 450.30 450.30 447.50 448.50  861238
3   2021-10-01T10:00:00 448.85 450.15 448.00 449.75  630605
4   2021-10-01T10:15:00 449.70 452.80 449.70 452.15  787878
...          ...   ...   ...   ...   ...   ...
1008 2021-11-30T14:30:00 467.95 467.95 463.75 464.90  640960
1009 2021-11-30T14:45:00 464.75 465.90 462.80 463.80  779615
1010 2021-11-30T15:00:00 463.50 464.20 460.50 460.90 2867190
1011 2021-11-30T15:15:00 460.90 461.20 458.30 459.95 3249165
1012 2021-11-30T15:30:00 459.75 459.75 459.10 459.10   616

[1013 rows x 6 columns]

You can observe from the above output that it has stored data in 1013 rows and 6 columns. An index is added to the data on the left most columns which is running from 0 to 1012.

In the next post, we will study basic functions like ATR in python and identify swing high and swing low in python.

Link to Part -03 post

Resources

Highest Rated Udemy Course on PineScript - Grab your Seat Now 

Udemy Discount Coupon Code : UDEMY-JAN23 (Valid upto 30th Nov 2023)

Learn more about coding on tradingview in PineScript through Books on pinescript available on amazon and kindle.


200+ pages book100 pages book200+ pages book


Point and Figure Charts : A Time-Tested Tool for Technical Analysis

In the dynamic world of financial markets, investors and traders constantly seek tools that can provide valuable insights into market trends...