Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs