Recent many works have concentrated on dynamically turning on/off some base stations (BSs) in order to improve energy efficiency in radio access networks (RANs). In this survey, we broaden the research over BS switching operations, which should competition up with traffic load variations. The proposed method formulate the traffic variations as a Markov decision process which should differ from dynamic traffic loads which are still quite challenging to precisely forecast. A reinforcement learning framework based BS switching operation scheme was designed in order to minimize the energy consumption of RANs. Furthermore a transfer actor-critic algorithm (TACT) is used to speed up the ongoing learning process, which utilizes the transferred learning expertise in historical periods or neighboring regions. The proposed TACT algorithm performs jumpstart and validates the feasibility of significant energy efficiency increment.