On Markov Chains, Martingales, and Stochastic Stabilization

- Chapter 13 of Meyn and Tweedie.

- R. L. Tweedie, "Drift conditions and invariant measures for Markov chains." Stoch. Process Appl., 92.2 (2001): 345-354.

- O. Hernandez-Lerma and J.B. Lasserre JB. Ergodic theorems and ergodic decomposition for Markov chains. Acta Applicandae Mathematica. pp:99-119, 1998.

- J. B. Lasserre. Invariant probabilities for Markov chains on a metric space. Statistics and Probability Letters, vol. 34, pp. 259-265, 1997.

- R. Douc, G. Fort, E.Moulines, and P. Soulier. Practical drift conditions for subgeometric rates of convergence. Ann. Appl. Probab, 14:1353-1377, 2004.

- S. B. Connor and G. Fort. State-dependent Foster-Lyapunov criteria for subgeometric convergence of Markov chains. Stoch. Process Appl., 119:176-4193, 2009.

- G.O. Roberts and J. Rosenthal, General state space Markov chains and MCMC algorithms, Probability Surveys, 20-71, 2004.

- L. Breiman. The individual ergodic theorem of information theory, Annals of Mathematical Statistics, pp. 809-811, 1957.

- R. Pemantle and J. S. Rosenthal. Moment conditions for a sequence with negative drift to be uniformly bounded in l_r. Stochastic Processes and their Applications, 82(1):143-155, 1999.

- B. Hajek. Hitting-time and occupation-time bounds implied by drift analysis with applications. Advances in Applied probability, pages 502-525, 1982.

- P. Tuominen and R.L. Tweedie. Subgeometric rates of convergence of f-ergodic Markov chains. Adv. Annals Appl. Prob., 26(3):775–798, September 1994.

On Dynamic Programming and Optimization Methods

- C. J. Himmelberg, T. Parthasarathy, and F. S. VanVleck. Optimal plans for dynamic programming problems. Mathematics of Operations Research, pp. 390-394, 1976.

- M. Schal. Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z. Wahrscheinlichkeitsth, pp. 179-296, 1975.

- V. S. Borkar. Average cost dynamic programming equations for controlled Markov chains with partial observations. SIAM Journal on Control and Optimization, 39(3):673-681, 2000.

- D. Blackwell. Memoryless Strategies in Finite-Stage Dynamic Programming. Annals of Math. Statist. Volume 35, Number 2, 863-865, 1964.

- J. B. Lasserre. Sample-path average optimality for Markov control processes. IEEE Trans. Automatic Contr. 44, pp. 1966--1971.

- C. Striebel. Sufficient statistics in the optimum control of stochastic systems. J. Math. Anal. Appl., vol. 12, pp. 576-592, 1965.

- K.W. Ross. Randomized and past dependent policies for markov decision processes with finite action set. Operations Research, 37:474-477, (1989).

- O. Hernandez-Lerma, J. Gonzales-Hernandez, and R. R. Lopez-Martinez. Constrained average cost Markov control processes in Borel spaces. SIAM J. Cont. and Opt. v. 42, pp. 442-468, 2003.

- T. T. Georgiou and A. Lindquist. "The separation principle in stochastic control, redux."

- S.E. Shreve and D.P. Bertsekas. Universally measurable policies in dynamic programming. Math. Oper. Res., 4(1):15–30, Feb 1979.

On Learning and Approximation Algorithms for Markov Decision Processes

- E. D. Vecchia and D. M. Silvia Di and J-M. Alain. Illustrated review of convergence conditions of the value iteration algorithm and the rolling horizon procedure for average-cost MDPs. Annals of Operations Research pp: 193-214, 2012.

- H. Langen. Convergence of dynamic programming models. Math. Oper. Res., 6(4):493–512, 1981.

- B.V. Roy. Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res., 31(2):234–244, May 2006.

- O. Hernandez-Lerma and J.B Lasserre: Error bounds for rolling horizon policies in discrete-time Markov control processes. IEEE Transactions on Automatic Control 35, 1118-1124 (1990)

- D. P. Bertsekas. A new value iteration method for the average cost dynamic programming problem. SIAM journal on control and optimization 36.2 (1998): 742-759.

- J. N. Tsitsiklis. Asynchronous Stochastic Approximation and Q-learning. Machine Learning, 16, 1994, pp. 185-202.

- T. Jaakkola, M. I. Jordan and S. P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural computation,

- E. Even-Dar and Y. Mansour. Learning rates for Q-learning. The Journal of Machine Learning Research, 5:1-25, 2004.

- V. S. Borkar and S. P. Meyn. The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control and Optimization, pp. 447-469, 2000.

- S. Mannor and J. N. Tsitsiklis. On the empirical state-action frequencies in Markov decision processes under general policies. Mathematics of Operations Research, 30(3):545, 2005.

- J. N. Tsitsiklis and B. Van Roy. Average Cost Temporal-Difference Learning. Proceedings of the IEEE Conference on Decision and Control, 1997.

- G, Neu, A. György, C. Szepesvári, and A. Antos. Online Markov Decision Processes under Bandit Feedback. IEEE Transactions on Automatic Control, 59, pp. 676--691, 2013.

- B. Hajek. A tutorial survey of theory and applications of simulated annealing. Conference on Decision and Control, 1985.

- B. Hajek. Cooling schedules for optimal annealing. Math. Operations Research, 1988.

-H. S Chang and S. I. Marcus. Approximate receding horizon approach for Markov decision processes: average reward case. Journal of Mathematical Analysis and Applications pp. 636-651 (2003)

- O. Hernández-Lerma. Finite-state approximations for denumerable multidimensional state discounted Markov decision processes. J. Math. Anal. Appl., vol. 113, pp. 382-388, 1986.

- L. Sennott. The computation of average optimal policies in denumerable state Markov decision chains. Advances in Applied Probability, pp. 114-137, 1997

- D. White. Finite-state approximations for denumerable state infinite horizon discounted Markov decision processes. J. Math. Anal. Appl., vol. 74, pp. 292-295, 1980.

- F. Dufuor, T. Prieto-Rumeau. Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochastics, (2014), pp. 1-35.

- E. Zhou, M. C. Fu, and S. I. Marcus. Solving continuous-state POMDPs via density projection. IEEE Transactions on Automatic Control, 55(5):1101–1116, 2010.

On Filtering Problems and Applications

- A. Budhiraja, and D. Ocone. Exponential stability of discrete-time filters for bounded observation noise. Systems & Control Letters

- Z. Khan, T. Balch, and Frank Dellaert. MCMC-based particle filtering for tracking a variable number of interacting targets. IEEE Pattern Analysis and Machine Intelligence, vol. 27, pp. 1805-1819, 2005.

- A. Doucet, N. J. Gordon, and V. Krishnamurthy. Particle filters for state estimation of jump Markov linear systems. IEEE Trans. Signal Processing, vol. 49 pp. 613-624, 2001.

- R. Van Handel.
Discrete time nonlinear filters with informative observations are
stable. Electronic Commun. Probability, vol. 13 pp: 562-575, 2008.

- P. Chigansky, R. Liptser R, and R. van Handel. Intrinsic methods
in filter stability. Handbook of Nonlinear Filtering, 2009.Decentralized Stochastic Systems: Teams and Games

- H. S. Witsenhausen. Separation of estimation and control for discrete time systems. Proc. IEEE, 59, 1971

- H. S. Witsenhausen. A counterexample in stochastic optimum control. SIAM Journal on Control and Optimization vol. 6, pp. 131-147, 1968.

- N. Sandell and M. Athans. Solution of some nonclassical LQG stochastic decision problems. IEEE Trans. Automatic Control , vol. 19, pp. 108-116, 1974.

- H. S. Witsenhausen. Equivalent stochastic control problems. Mathematics of Control, Signals, and Systems , vol. 1, pp. 3-11, Springer-Verlag, 1988.

- H. S. Witsenhausen. A standard form for sequential stochastic control. Math. Syst. Theor. 5-11 (1973).

- D. Teneketzis, On information structures and nonsequential stochastic control, CWI Quart., 9 (1996), pp. 241–260.

- R. Radner. Team decision problems,'' Annals of Mathematical Statistics , vol. 33, pp. 857 - 881, 1962.

- J.C. Krainak and S.I. Marcus. Static team problems - part I: Sufficient conditions and the exponential cost criterion. IEEE Trans. Automat. Contr., vol. 27, pp. 839-848, April 1982.

- Chapter 5 of the dissertation of J.N. Tsitsiklis. Problems in Decentralized Decision Making and Computation" Ph.D. Thesis, EECS, MIT, 1984

- J. S. Shamma and G. Arslan. Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria. IEEE Trans. Automat. Control, vol. 50, no. 3, pp. 312-327, Mar. 2005.

- J. N. Tsitsiklis and M. Athans. Convergence and asymptotic agreement in distributed decision problems. IEEE Trans Autom. Control, vol. AC-29, no. 1, pp. 42-50, Jan. 1984.

- V. Borkar and P. Varaiya, Asymptotic agreement in distributed estimation. IEEE Trans. Automar. Contr. pp. 650-655, June 1982.

- L.T. Nielsen. Common knowledge, communication, and convergence of beliefs. Math. Sot. Sci. 8, 1-14, (1984).

- O. Gossner. Simple Bounds on the Value of a Reputation." Econometrica 79.5 (2011): 1627-1641.

- W. Sandholm. Stochastic Evolutionary Game Dynamics: Foundations, Deterministic Approximation, and Equilibrium Selection". In

- E. Kalai and E. Lehrer. Weak and strong merging of opinions". Journal of Mathematical Economics, 23:73-100, 1994

- J. Nachbar. Beliefs in repeated games". Econometrica, 73:459-480, 2005.

- J. Nachbar. Learning in Games. in R. A Meyers and P. Kokol, eds. Computational complexity: theory, techniques, and applications. pp. 1695-1705. Springer New York, 2012.

- B. Jovanovic, Boyan, and R. W. Rosenthal. "Anonymous sequential games. Journal of Mathematical Economics, 1988.

- T. Basar, “Decentralized multicriteria optimization of linear stochastic systems,” IEEE Transactions on Automatic Control, vol. 23, no. 2, pp. 233–243, April 1978

- D. Jacobson. Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Transactions on Automatic control, 18(2):124-131, 1973.

- P. Dai Pra, L. Meneghini, and W. J. Runggaldier. Connections between stochastic control and dynamic games.

- P. Dupuis, M. R. James, and I. Petersen. Robust properties of risk-sensitive control. Mathematics of Control, Signals and Systems, 13(4):318-332, 2000.

- P. Whittle. A risk-sensitive maximum principle: The case of imperfect state observation.

- I. Tzortzis I, C.D. Charalambous CD and T. Charalambous. Dynamic programming subject to total variation distance ambiguity. SIAM Journal on Control and Optimization. pp. 2040-75, 2015.

On Structural Results for Policies and Applications in Communications, Networks, Finance

- J. C. Walrand and P. Varaiya. Optimal causal coding-decoding problems. IEEE Trans. Inform. Theory, 19:814-820, November 1983 (this will be supplemented by a paper or more, depending on your group)

- S. Tatikonda, and S. Mitter. The capacity of channels with feedback. IEEE Trans. Inform. Theory, 323-349, 2009.

- A. Nayyar, T. Basar, D.Teneketzis and V.V. Veeravalli. Optimal Strategies for Communication and Remote Estimation with an Energy Harvesting Sensor. IEEE Transactions on Automatic Control, Vol. 58, No. 9, pp. 2246-2260, September, 2013.

- W. Wu and A. Arapostathis. Optimal sensor querying: General markovian and LQG models with controlled observations. IEEE Transactions on Automatic Control, vol. 53, no. 6, pp. 1392 -1405, July 2008.

- R. Agrawal, M. V. Hegde, and D. Teneketzis. Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost. IEEE Transactions on Automatic Control, 33:899-906, 1988.

- B. Hajek, K. Mitzel, and S. Yang. Paging and registration in cellular networks: jointly optimal policies and an iterative algorithm. IEEE Transactions on Information Theory, 54, 2008).

Continuous-time Stochastic Control

- First three chapters of 'Stochastic Stability and Control', by Kushner.

- Chapter 4 of 'Stochastic Stability and Control', by Kushner

- S. P. Meyn and R. L. Tweedie. Stability of Markovian processes iii: Foster-Lyapunov criteria for continuous-time processes. Adv. Appl. Probability, pages 518-548, 1993.

- R. J. Elliott. The optimal control of a stochastic system. SIAM Journal on Control and Optimization, pp. 756-778, 1977.

- M. Davis and P Varaiya. Dynamic programming conditions for partially observable stochastic systems. SIAM Journal on Control, pp.226-61, 1973.

Implementation of Iterative Algorithms Leading to Convergence to Optimal Policies

- You can also choose to implement policy iteration algorithms, value iteration algorithms or linear programming algorithm for a given sample problem as part of your project. You will be provided with references. Please talk to the instructor if you wish to implement such an algorithm and present your results in class. You will need to include your (Matlab) Codes in your report.

- The reports should be less than 8 pages and roughly contain the following: 1/3 of the report should focus on the problem description, known results and a literature review sufficient enough to reflect your knowledge of the field;1/3 of the report on the paper's main contributions and results; you do not need to give details of the proofs; and 1/3 of the report should be on your review and critique of the paper.