These are summary notes from Leonard Susskind's online course on classical mechanics.

The principle of least action

Curiously, many physical laws can be described in the following way.

First, suppose that a physical state is determined by a potion $q$ (which could be very high-dimensional if you're describing many particles) and a velocity $\dot q$. That is, we suppose that you can specify a position and velocity independently, and doing so determines the future of the system. If you want your physical states to require more derivatives, you could do that, or if you wanted them to depend on time, you could, but let's not do that now. So our coordinates on state space are $q$ and $\dot q$.

You define a Lagrangian function $L(q,\dot q)$. Then for any path in state space $q(t)$, you can define the action of the path to be
$$
S = \int L(q,\dot q)\, dt.
$$
Then you declare your physical law to be that the path actually taken has minimal action (or rather stationary action; i.e. any deformation of the path, leaving the endpoints fixed, doesn't change the action to first order).

This is not a fact, it's just a way of turning a function of $q$ and $\dot q$ into a physical law. It's not at all clear (to me, right now) why every physical law should be of this form, why the law of mechanics ($m\ddot q = -\frac{\partial V}{\partial q}$) should be of this form, or what the meaning of action or of the Lagrangian function are.

For classical mechanics, the Lagrangian is the kinetic energy minus the potential energy, $L(x,\dot x) = \frac m2 \dot x^2 - V(x)$.

The Euler-Lagrange equation

The non-local condition that action is stabilized can be turned into local differential equations which look like more familiar laws of motion. Suppose we'd found a trajectory $q(t)$ which minimized action. What properties would it have? If you considered a family of nearby paths with the same endpoints $q_x(t)=q(t)+x\delta(t)$, then the derivative of the action at $x=0$ should be zero. That's what it means for the action to be stationary to first order. So we compute that derivative, using the fact that $\frac{d}{dx}\Bigr|_{x=0}q_x(t)=\delta(t)$ and $\frac{d}{dx}\Bigr|_{x=0}\dot q_x(t)=\dot \delta(t)$, and using integration by parts:
$$\begin{align*}
0 = \frac{d}{dx}\Biggr|_{x=0}S &= \int \Bigl(\frac{\partial L}{\partial q}\delta(t) + \frac{\partial L}{\partial \dot q} \dot\delta(t)\Bigr)\,dt\\
&= \frac{\partial L}{\partial \dot q} \delta(t)\Biggr|_{t_\text{start}}^{t_\text{end}} + \int \Bigl(\frac{\partial L}{\partial q}\delta(t) - \frac{d}{dt}\frac{\partial L}{\partial \dot q} \delta(t)\Bigr)\,dt
\end{align*}$$
Since $\delta(t_\text{start})=\delta(t_\text{end})=0$ (the endpoints of the path aren't allowed to wiggle), the first term is zero, so the last integral has to be zero, no matter what $\delta$ we chose to start with. So the part of the integrand multiplying $\delta$ has to be zero. This is the Euler-Lagrange equation. For convenience, we define the canonical momentum $p$ to be the derivative $\partial_{\dot q}L$.
$$\boxed{\displaystyle
p := \frac{\partial L}{\partial \dot q},\qquad \dot p = \frac{\partial L}{\partial q}.
}$$

It often happens that we want to differentiate something without specifying exactly what we're differentiating with respect to. It would have been nice to run the above argument without specifying $x$ and $\delta$ explicitly. We'll use the following notation for it: $\delta L = \partial_q L \delta q + \partial_{\dot q} L \delta \dot q$ (note: $\delta \dot q$ is the time derivative of $\delta q$).

In classical mechanics, we have that $p_i = \partial_{\dot x_i}L = m\dot x_i$ and $\dot p_i = m\ddot x_i = -\frac{\partial V}{\partial x_i}$ for all $i$. That's good old Newton's law!

A big advantage of the Lagrangian formulation is that it makes it easy to change coordinates. For example, suppose we wanted to work out the laws of motion as they appear to someone on a merry-go-round. Then we could just figure out the coordinate transformation, rewrite the Lagrangian in those rotating coordinates, and extract the Euler-Lagrange equation as the equation of motion. That's much easier than figuring out how velocities and forces transform into the new coordinate system.

As another example, consider the double pendulum. To work out what the equations of motion are, instead of drawing free body diagrams, you can pick coordinates that you like, express kinetic energy minus potential energy in those coordinate, and extract the Euler-Lagrange equation.

Spatial symmetries $\Leftrightarrow$ conserved momenta

Suppose the Lagrangian is invariant under some continuous spatial symmetry. Consider a vector field in the $q$ direction, depending only on $q$ (not $\dot q$), $\sum f_i(q)\partial_{q^i}$. By Euler-Lagrange, we have
$$
\sum f_i(q) \partial_{q^i}L = \sum f_i(q) \dot p_i = \frac{d}{dt} \Bigl(\sum f_i(q)p_i\Bigr).
$$
That's very interesting. It says that the Lagrangian is invariant under the vector field if and only if the momentum $\sum f_i(q)p_i$ is conserved.

For classical mechanics, this tells us that the Lagrangian is invariant under translation in the $x$ direction if and only if the momentum in the $x$ direction is conserved, or that the Lagrangian is invariant under rotation about some axis if and only if angular momentum about that axis is conserved. Neat!

As a special case, if some $q^i$ doesn't appear in the Lagrangian at all (but $\dot q^i$ may appear), it's called a cyclic variable. It's clear that the Lagrangian is invariant under translation in the $q^i$ direction, so the canonical momentum $p_i$ is invariant.

Noether's theorem is a more general form of this relationship, but I haven't worked on digesting that yet.

The Hamiltonian formulation

The classical Lagrangian has units of energy, but it's not the total energy of the system, which is weird. In particular, the value of the Lagrangian isn't conserved. We can compute
$$
\frac{d}{dt} L = \frac{\partial L}{\partial q} \dot q + \frac{\partial L}{\partial \dot q} \ddot q = \dot p q + p \ddot q = \frac{d}{dt} (p\dot q).
$$
So the Hamiltonian $H:=p\dot q - L$ is conserved. For classical mechanics, we have $H=m\dot q^2 - (\frac m2 \dot q^2 - V(q)) = \frac m2 \dot q^2 + V(q)$ is the total energy. In other systems, this will be the total energy.

When thinking about the Hamiltonian, we usually think of $q$ and $p$ as being the coordinates, rather than $q$ and $\dot q$. Lagrangians for which $\dot q$ cannot be solved for in terms of $p$ are considered bad.

We can compute the variation of the Hamiltonian under unspecified variations of $q$ and $p$:
$$\begin{align*}
\delta H &= \delta p \dot q + p\delta \dot q - \frac{\partial L}{\partial q}\delta q - \frac{\partial L}{\partial \dot q}\delta \dot q\\
&= \delta p \dot q + p\delta\dot q - \dot p\delta q - p\delta \dot q\\
&= \dot q \delta p - \dot p \delta q. \end{align*}$$
But in the variation of $H$ under arbitrary variations of $p$ and $q$, the multiplier of $\delta p$ must be $\partial_p H$, and the multiplier of $\delta q$ must be $\partial_q H$. So we get the Hamiltonian equations of motion.
$$\boxed{\displaystyle
\dot q = \frac{\partial H}{\partial p}, \qquad \dot p = -\frac{\partial H}{\partial q}.
}$$

The Poisson bracket

Given any function on phase space $f(q,p)$, its time derivative is
$$\begin{align*}
\frac{df}{dt} &= \frac{\partial f}{\partial q}\dot q + \frac{\partial f}{\partial p}\dot p\\
&= \frac{\partial f}{\partial q}\frac{\partial H}{\partial p} - \frac{\partial f}{\partial p}\frac{\partial H}{\partial q}.
\end{align*}$$
This structure comes up a lot, so we define the Poisson bracket of two functions on phase space to be $\{f,g\} = \partial_q f\partial_p g - \partial_p f\partial_q g$. Then the above calculation says that to see how any function is going to evolve in time, you just bracket with the Hamiltonian: $\dot f = \{f,H\}$. We say that "the Hamiltonian generates evolution in time."

Similarly, the momentum $\sum f_i(q)p_i$ generates spatial translation along the vector field $\sum f_i(q)\partial_{q^i}$ (i.e. bracketing with the momentum is the same as differentiating along the vector field). This follows immediately from the observation that $\{-,p_i\} = \partial_{q^i}$, that bracketing with anything is a derivation, and that $\{q^i,q^j\}=\{p_i,p_j\}=0$ (and that functions of $q$ can be approximated by polynomials in $q$).

As an example of how this is handy, Susskind shows how you'd use Poisson brackets to compute the motion of a gyroscope in a gravitational field. The Hamiltonian is the energy, which we can compute in terms of the angular momentum and the moment of inertia, $H = \frac{1}{2I}(L_x^2+L_y^2+L_z^2) - m(kL_z)^2$, where $k$ is chosen so that $kL_z$ is the height of the center of mass of the gyroscope. Then you can compute the time evolution of $L_x$, $L_y$, and $L_z$ (i.e. the axis of the gyroscope) by taking the bracket with $H$ and chugging away using properties of the Poisson bracket. This is particularly easy since you know, for example, that $L_z$ generates rotation about the $z$-axis, so for any vector quantity $v$, including angular momentum, $\{v_x,L_z\} = -v_y$, $\{v_y,L_z\}=v_x$, and $\{v_z,L_z\}=0$.

Electromagnetic forces

To recover classical electrodynamics, with constant electric field $E=\nabla V$ ($E$ is curl-free), constant magnetic field $B=\nabla \times A$ ($B$ is divergence-free), and a particle of charge $Q$, we take the Lagrangian to be $L(x, \dot x) = \frac{m}{2} \dot x^2 - \frac{Q}{4\pi\epsilon_0}\bigl(V(x) + \frac{\dot x}{c}\cdot A\bigr)$. Then the canonical momentum is $p = m\dot x + \frac{Q}{4\pi\epsilon_0c}A$, so the Euler-Lagrange equations are $m\ddot x_i + \frac{Q}{4\pi\epsilon_0c} \partial_j A_i \dot x_j = \frac{Q}{4\pi\epsilon_0}(\partial_i V + \frac{1}{c}\dot x\cdot \partial_i A)$. Once we regroup and simplify everything, we get
$$
m\ddot x = \frac{Q}{4\pi\epsilon_0}\Bigl(\nabla V + \frac{1}{c}\dot x\times (\nabla\times A)\Bigr).
$$
Yay! That's what we're familiar with.

One thing to notice is that the final equations of motion only depend on $\nabla V=E$ and $\nabla\times A = B$, so we can change $V$ by a constant (which is boring), or change $A$ by the gradient of a function (which is interesting). The Lagrangian doesn't actually stay constant under these operations, but the paths of least action stay the same, so the physical law is the same. Changing these potentials in this way is called applying a gauge transformation. A really handy thing to do is to change the potential (without changing $B$) to create cyclic variables so that you can easily spot conserved momenta.