Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with 30 mins Time frame for ^NSEI (national stock exchange of india) #1436

Closed
AbhishekSRaut opened this issue Feb 28, 2023 · 32 comments

Comments

@AbhishekSRaut
Copy link

AbhishekSRaut commented Feb 28, 2023

There is problem with 30 mins time frame with indian stocks.
the 30 mins time interval data is showing wrong open and close.
for example:
in nifty, the first 30 mins candle is starts from 9:15 and, ends at 9:30.
and after that it is continueing from 9:30 to 10:00.
but, because of this, our all strategies getting destroied, as normal candle starts at 9:15, and ends at 9:45.
the last candle of nifty should be of 15 mins, i.e. starts at 15:15, and ends at 15:30.
Please solve this issue as soon as possible, because due to this, our all strategies / scanner gets destroied.
Thank you.
regards,
Abhishek Raut.

@ValueRaider
Copy link
Collaborator

ValueRaider commented Feb 28, 2023

This is because yfinance resamples 30m data from 15m, but naive resampling assumes alignment at hh:00. Why resample? Because apparently requesting Yahoo for 30m data returned 60m, but this was years ago. Today Yahoo's data looks fine, so potentially the resampling is not needed.

If someone wants to solve, do this:

  • resample properly, by first offsetting the interval times by market open time (get this from history_metadata)
  • move this resampling into a unit test, and verify that the resampling is now not necessary (test on variety of stock exchanges)

@ValueRaider
Copy link
Collaborator

Regarding 60m - I can't reproduce, all the hourly intervals start/end at hh:15

@AbhishekSRaut
Copy link
Author

AbhishekSRaut commented Mar 1, 2023

This is because yfinance resamples 30m data from 15m, but naive resampling assumes alignment at hh:00. Why resample? Because apparently the requesting Yahoo for 30m data returned 60m, but this was years ago. Today Yahoo's data looks fine, so potentially the resampling is not needed.
If someone wants to solve, do this:
• resample properly, by offsetting the intervals times by market open time (get this from history_metadata)
• move this resampling into a unit test, and verify that the resampling is now not necessary (test on variety of stock exchanges)

I do not understand your solution. Can you please explain how to solve this in python? or, can you please provide the python code?
Thank you.
regards,
Abhishek Raut.

@AbhishekSRaut
Copy link
Author

AbhishekSRaut commented Mar 1, 2023

Regarding 60m - I can't reproduce, all the hourly intervals start/end at hh:15

I am sorry, As i recheck now, the 60 mins time frame working well.
its ending at hh:15.
sorry for the confusion.

@ValueRaider
Copy link
Collaborator

I do not understand your solution.

That's ok. Someone else will, someone familiar with yfinance code.

@AbhishekSRaut
Copy link
Author

That's ok. Someone else will, someone familiar with yfinance code.

It means, We have to wait for the update of yfinance.
How long we can expect this?
or there isn't any temporary solution to solve it?

@AbhishekSRaut
Copy link
Author

That's ok. Someone else will, someone familiar with yfinance code.

I have solved this issue.
Those who face same dificulty like me, can try the below example code:

import yfinance as yf
from datetime import datetime, timedelta
import pytz
import pandas as pd

# Set the time zone for the data
tz = pytz.timezone('Asia/Kolkata')

# Define the start and end times for the data
now = datetime.now(tz)
end_time = now.replace(second=0, microsecond=0)
start_time = end_time - timedelta(days=30)

# Define a custom time interval for the data
interval = '60m'

# Define the opening and closing times for the custom interval
open_time_1 = timedelta(minutes=15)
close_time_1 = timedelta(minutes=45)
open_time_2 = timedelta(minutes=45)
close_time_2 = timedelta(minutes=15)

# Define a custom function to adjust the timestamps for the custom interval
def adjust_timestamp(timestamp):
    if timestamp.minute < 15:
        return timestamp.replace(minute=15, second=0, microsecond=0)
    elif timestamp.minute < 45:
        return timestamp.replace(minute=45, second=0, microsecond=0)
    else:
        return timestamp + timedelta(hours=1)

# Download the data with the custom time interval and adjust the timestamps
data = yf.download('^NSEI', start=start_time, end=end_time, interval=interval)
data = data.rename_axis('Date_Time').reset_index()
data['Date_Time'] = data['Date_Time'].apply(adjust_timestamp)
data = data.set_index('Date_Time')

# Save the data to a CSV file
data.to_csv('nifty.csv')

# Print a message to confirm that the data has been saved
print(f"Data saved to nifty.csv from {start_time.strftime('%Y-%m-%d')} to {end_time.strftime('%Y-%m-%d')}")

@ValueRaider
Copy link
Collaborator

There's a real bug here, leave it open. Your hack doesn't fix the underlying problem, the first interval of day is wrong.

@ValueRaider ValueRaider reopened this Mar 3, 2023
@AbhishekSRaut
Copy link
Author

AbhishekSRaut commented Mar 4, 2023

There's a real bug here, leave it open.
Your hack doesn't fix the underlying problem, the first interval of day is wrong.

You are right, I didn't notice this.
also one more bug in it, it direct take the closing of last candle as opening of new. but on real chart, its not like that.
i will try to fix it, if you can, Please help me to solve this.
Thank you.

@ValueRaider
Copy link
Collaborator

quotes2 = quotes.resample('30T')

@ivan23kor
Copy link
Contributor

Hi @AbhishekSRaut,

30min intervals for ^NSEI on YahooFinance start at HH:00 - [9:00AM; 9:30AM], [9:30AM; 10:00AM], [10:00AM; 10:30AM], ... [2:30PM; 3:00PM]

60min intervals for ^NSEI on YahooFinance start at HH:15 - [9:15AM; 10:15AM], [10:15AM; 11:15AM], [11:15AM; 12:15PM], ... [2:15PM; 3:15PM]

yfinance is a scraping package so it will present the same data as on finance.yahoo.com.

@ValueRaider
Copy link
Collaborator

@ivan23kor Read the thread carefully, you are wrong.

@ivan23kor
Copy link
Contributor

@ValueRaider you are right that 30m data is resampled from 15m.
I am not debating that, I am saying that 60m data starts at HH:15 on YahooFinance.

As I understood @AbhishekSRaut, he wants 60m data to start at HH:00. Is that the issue?

@ValueRaider
Copy link
Collaborator

No they want it aligned to HH:15. Yahoo returns this, but yfinance loses this when it resamples.

@ivan23kor
Copy link
Contributor

ivan23kor commented Mar 4, 2023

I am running this script:

import yfinance as yf
print(yf.download('^NSEI', start='2023-03-03', interval='30m'))
print(yf.download('^NSEI', start='2023-03-03', interval='60m'))
print(yf.download('^NSEI', start='2023-03-03', interval='1h'))

and this is the output I am getting:

[*********************100%***********************]  1 of 1 completed
                             Open          High           Low         Close     Adj Close  Volume
Datetime                                                                                         
2023-03-03 09:30:00  17467.300781  17495.250000  17453.099609  17485.949219  17485.949219       0
2023-03-03 10:00:00  17485.750000  17527.300781  17474.000000  17521.199219  17521.199219       0
2023-03-03 10:30:00  17520.500000  17555.849609  17516.750000  17542.099609  17542.099609       0
2023-03-03 11:00:00  17542.599609  17569.000000  17534.900391  17561.750000  17561.750000       0
2023-03-03 11:30:00  17562.150391  17578.449219  17561.050781  17566.599609  17566.599609       0
2023-03-03 12:00:00  17566.300781  17578.050781  17559.550781  17560.650391  17560.650391       0
2023-03-03 12:30:00  17559.949219  17563.949219  17535.900391  17556.199219  17556.199219       0
2023-03-03 13:00:00  17557.250000  17579.050781  17553.400391  17573.150391  17573.150391       0
2023-03-03 13:30:00  17573.449219  17607.650391  17572.699219  17598.000000  17598.000000       0
2023-03-03 14:00:00  17597.750000  17628.349609  17596.949219  17624.250000  17624.250000       0
2023-03-03 14:30:00  17623.800781  17644.699219  17615.099609  17627.400391  17627.400391       0
2023-03-03 15:00:00  17626.699219  17627.050781  17585.000000  17592.300781  17592.300781       0
[*********************100%***********************]  1 of 1 completed
                             Open          High           Low         Close     Adj Close  Volume
Datetime                                                                                         
2023-03-03 09:15:00  17451.250000  17514.300781  17430.500000  17512.650391  17512.650391       0
2023-03-03 10:15:00  17513.900391  17555.849609  17508.550781  17547.849609  17547.849609       0
2023-03-03 11:15:00  17547.500000  17578.449219  17534.900391  17570.550781  17570.550781       0
2023-03-03 12:15:00  17570.550781  17575.250000  17535.900391  17561.650391  17561.650391       0
2023-03-03 13:15:00  17561.050781  17613.750000  17560.550781  17609.750000  17609.750000       0
2023-03-03 14:15:00  17609.699219  17644.699219  17587.199219  17591.900391  17591.900391       0
2023-03-03 15:15:00  17592.449219  17598.349609  17585.000000  17594.349609  17594.349609       0
[*********************100%***********************]  1 of 1 completed
                                   Open          High           Low         Close     Adj Close  Volume
Datetime                                                                                               
2023-03-03 09:15:00+05:30  17451.250000  17514.300781  17430.500000  17512.650391  17512.650391       0
2023-03-03 10:15:00+05:30  17513.900391  17555.849609  17508.550781  17547.849609  17547.849609       0
2023-03-03 11:15:00+05:30  17547.500000  17578.449219  17534.900391  17570.550781  17570.550781       0
2023-03-03 12:15:00+05:30  17570.550781  17575.250000  17535.900391  17561.650391  17561.650391       0
2023-03-03 13:15:00+05:30  17561.050781  17613.750000  17560.550781  17609.750000  17609.750000       0
2023-03-03 14:15:00+05:30  17609.699219  17644.699219  17587.199219  17591.900391  17591.900391       0
2023-03-03 15:15:00+05:30  17592.449219  17598.349609  17585.000000  17594.349609  17594.349609       0

which corresponds to Yahoo Finance:

  1. 30m: image

  2. 60m: image

@AbhishekSRaut please provide your code, the output you are getting and explain what is the desired output, if not the same as finance.yahoo.com

@ValueRaider
Copy link
Collaborator

ValueRaider commented Mar 4, 2023

@ivan23kor You are presuming the data returned by Yahoo via the API matches the chart contents. Not necessarily, and not in this case. PLEASE review base.py and how it handles 30m interval.

@ivan23kor
Copy link
Contributor

@ValueRaider 30m data is downsampled 15m data.

@ValueRaider can you explain me what's the problem, if the output of yfinance.download is the same data as on Yahoo Finance (the reference)?

@ValueRaider
Copy link
Collaborator

ValueRaider commented Mar 4, 2023

Problem 1: ask Yahoo for 30m data and response is aligned to HH:15, but the 15m->downsampled->30m is aligned to HH:00, so yfinance is not simply returning what Yahoo returns.

Problem 2: first 15m interval disappears during downsampling, so the Open, High and Low of first downsampled 30m interval are wrong.

@AbhishekSRaut
Copy link
Author

@ivan23kor My issue is not with 60 mins time frame.
issue coming in 30 mins interval.
in the indian stock market, 30 mins interval is from hh:15 to hh:45. and hh:45 to hh:15.
but yfinance returning from hh:00 to hh:30, and hh:30 to hh:00.
as previously said by @ValueRaider
since, my pandas knowledge is basic, so i don't know even how to deels with it, and how to solve this problem.

@AbhishekSRaut AbhishekSRaut changed the title problem with 30 and 60 mins Time frame for ^NSEI (national stock exchange of india) problem with 30 mins Time frame for ^NSEI (national stock exchange of india) Mar 5, 2023
@AbhishekSRaut
Copy link
Author

AbhishekSRaut commented Mar 7, 2023

I have solved this problem by following code:

import yfinance as yf
import pandas as pd
import os

#function to convert 15 mins data to 30 mins data.
def mins_30(df):
	datetimelist=[]
	openlist=[]
	highlist=[]
	lowlist=[]
	closelist=[]
	volumelist=[]
	for i in range(len(df)):
		datetime1=df.iloc[i,0]
		hour1=int(f'{datetime1[-8]}{datetime1[-7]}')
		# hour1 is only hour extracted from date time column
		min1=int(f'{datetime1[-5]}{datetime1[-4]}')
		# min1 stands for minits from the datetime column
		if min1==15 and hour1 != 15:
			# date time is of starting candle, so if last candle, that is 15:15 to 15:30, for that, we have to write seprate conditions. this is for other than last candle. because, yfinance returns starting time of candle, not ending time.
			try:
				datetimelist.append(df.iloc[i,0])
				openlist.append(df.iloc[i,1])
				highlist.append(max(df.iloc[i,2],df.iloc[i+1,2]))
				lowlist.append(min(df.iloc[i,3],df.iloc[i+1,3]))
				closelist.append(df.iloc[i+1,4])
				volumelist.append(df.iloc[i,5])
				# i+1 is given to take the data of 15 mins of current iteration, and next wrow of 15 mins. i.e. in current iteration, its hh:15. but for hh:15 to hh:45, we need max high and min low of 2 15 mins candle. so second is hh:30 to hh:45. for that, we take this i+1.
			except:
				datetimelist.append(df.iloc[i,0])
				openlist.append(df.iloc[i,1])
				highlist.append(df.iloc[i,2])
				lowlist.append(df.iloc[i,3])
				closelist.append(df.iloc[i,4])
				volumelist.append(df.iloc[i,5])
				# This try and except block is given, because, i+1, that is next candle it will take, only if time > hh:30. if time is hh:20, then it have to add the current 15 mins candle high low.
		elif min1==45 and hour1 != 15:
			try:
				datetimelist.append(df.iloc[i,0])
				openlist.append(df.iloc[i,1])
				highlist.append(max(df.iloc[i,2],df.iloc[i+1,2]))
				lowlist.append(min(df.iloc[i,3],df.iloc[i+1,3]))
				closelist.append(df.iloc[i+1,4])
				volumelist.append(df.iloc[i,5])
			except:
				datetimelist.append(df.iloc[i,0])
				openlist.append(df.iloc[i,1])
				highlist.append(df.iloc[i,2])
				lowlist.append(df.iloc[i,3])
				closelist.append(df.iloc[i,4])
				volumelist.append(df.iloc[i,5])
		elif min1==15 and hour1==15:
			# This condition is for last candle. i.e. 15:15 to 15:30. yfinance return starting time of candle, not ending time.
			datetimelist.append(df.iloc[i,0])
			openlist.append(df.iloc[i,1])
			highlist.append(df.iloc[i,2])
			lowlist.append(df.iloc[i,3])
			closelist.append(df.iloc[i,4])
			volumelist.append(df.iloc[i,5])
		else:
			continue
	new_df={'datetime':datetimelist,'open':openlist,'high':highlist,'low':lowlist,'close':closelist,'volume':volumelist}
	return new_df

def third():
	check1 = os.path.exists('nifty.csv')
	if check1 == True:
		os.remove('nifty.csv')
	data = yf.download(tickers='^NSEI', period='60d', interval='15m', group_by='ticker', auto_adjust=False)
	df = pd.DataFrame(data)
	df.reset_index(drop=False, inplace=True)
	df.rename(columns={'Datetime': 'datetime'}, inplace=True)
	df.to_csv('nifty.csv', index=False, columns=['datetime', 'Open', 'High', 'Low', 'Close', 'Volume'])

third()
old_df = pd.read_csv('nifty.csv')
old_df = pd.DataFrame(old_df)
newdf= mins_30(old_df)
newdf=pd.DataFrame(newdf)
if os.path.exists('nifty_30mins.csv'):
	os.remove('nifty_30mins.csv')
newdf.to_csv('nifty_30mins.csv',index=False)

Maybe this code can looks like complicated, but I can't make it more simplify, since, I am not professional developer or not having good skills of pandas.
I don't think now it have any bugs.
my special thanks to @ValueRaider because, if they given me full code, then i won't get the knowledge of pandas, Also they reply me with in less period of time.
I request you to review the code, and tell me. if it works perfect, then we will close this issue.

@ValueRaider
Copy link
Collaborator

ValueRaider commented Mar 7, 2023

@AbhishekSRaut I'm not reviewing that code, it's awful. Pandas isn't difficult, do some tutorials, and Numpy tutorials too because I suspect you don't understand vector programming.

@AbhishekSRaut
Copy link
Author

I'm not reviewing that code, it's awful. Pandas isn't difficult, do some tutorials, and Numpy tutorials too because I suspect you don't understand vector programming.

Yes. You are right, I do not know vector programming.
And, will surely follow your advice.
but. about this code, if you run and check, the data is coming in a desired format.
if you can update the code from complex to simple code, then it will be good, as i am not aware of much pandas function.
I think since there is no bug, and issue also resolved, should we close this?
*note: I am not professional developer, so in above code, you may face dificulties.
I ask sorry for this.
just this yfinance sudden change destroied my strategies and scanner, That's why i gon in that much depth.

@ValueRaider
Copy link
Collaborator

if you can update the code from complex to simple code

Impossible because this is fundamentally wrong way to fix bug. You could look inside base.py and see how '30m' is handled specially, maybe should be handled normally like other intervals.

So please don't close Issue.

@AbhishekSRaut
Copy link
Author

Impossible because this is fundamentally wrong way to fix bug. You could look inside base.py and see how '30m' is handled specially, maybe should be handled normally like other intervals.
So please don't close Issue.

yeah, you are right. Will try...

@AbhishekSRaut
Copy link
Author

see how '30m' is handled specially, maybe should be handled normally like other intervals.

You are right.
In base.py:
if we comment line 631, 632, line 725 to line 734, then also the issue getting solved.
technicaly we can say, maybe the bug which yfinance assume, is not exists. because, if we comment those lines, still we getting correct interval data from yahoo.

@ivan23kor
Copy link
Contributor

@ValueRaider, referring to the two problems you mentioned here:

  1. so yfinance is not simply returning what Yahoo returns.
    Yahoo's frontend at finance.yahoo.com is the reference which yfinance must always return. Yes, for the bug @AbhishekSRaut found yfinance might process the backend data in a different way, but that doesn't matter as long as the final output is equivalent to finance.yahoo.com

  2. @ValueRaider, you are right here, the first data point (9:00AM) is dropped from the final output, and this is the bug here. I have traced it down to fix_Yahoo_returning_prepost_unrequested, called in base.py.

@AbhishekSRaut is right that ^NSEI opens at 9:15AM and closes 3:15PM local time. However, Yahoo Finance's frontend returns HH:00 timestamps for some reason and yfinance should follow that behaviour.

@ivan23kor
Copy link
Contributor

yfinance returns the right timestamps for ^NSEI (9:30AM, 10:00AM, 10:30AM, ...), except it misses the first one (9:00AM). Right here means equal to Yahoo Finance frontend data.

I am against switching to 30m data for 30m intervals, because Yahoo's frontend gets 15m from the backend. That's what yfinance should keep doing. I suggest disabling fix_Yahoo_returning_prepost_unrequested for this particular case (^NSEI + 15m interval).

@ValueRaider
Copy link
Collaborator

ValueRaider commented Mar 7, 2023

I suggest disabling fix_Yahoo_returning_prepost_unrequestedfor this particular case

I think exchange-specific code bad idea, and better fix is moving resampling to after.

I don't have strong opinion on hh:00 vs hh:15. @AbhishekSRaut why do you need 30m aligned to hh:15?

@AbhishekSRaut
Copy link
Author

why do you need 30m aligned to hh:15?

Because for ^NSEI the main time frame for 30 mins interval, starts as market starts at 9:15. not only ^NSEI, its applicable to both indian stock exchanges, "NSE" and "BSE"
it means, all stocks, listed on these both exchanges, follow this interval.
as official 30 mins interval for all is hh:15 to hh:45, all technical analysis is based on level of hh:15 to hh:45.
if we follow hh:00 to hh:30, then technical levels will be wrong, and result in wrong trade.

@ValueRaider
Copy link
Collaborator

if we follow hh:00 to hh:30, then technical levels will be wrong, and result in wrong trade.

Good argument.

@ivan23kor I must point out that Yahoo doing something isn't necessarily a good thing. They routinely have errors in non-US price data that yfinance fixes - see the fix_*() methods and repair.

@ivan23kor
Copy link
Contributor

@AbhishekSRaut thanks for pointing out the two Indian stock exchanges that have this issue.

@ValueRaider the order of fix_Yahoo_returning_prepost_unrequested and downsampling doesn't matter in this case, as trading start date is 9:15 but the downsampling interval is 30m.

@AbhishekSRaut @ValueRaider I've opened #1447 to fix this issue. Let's move the conversation there.

This was referenced Aug 15, 2024
@aleksfasting
Copy link
Contributor

#2027 is merges, and solves this issue. It should be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants