It's important to understand that there are aspects to thread safety:
(1) Execution control, and
(2) Memory visibility.
The first has to do with controlling when code executes (including the order in which instructions are executed) and whether it can execute concurrently, and the second to do with when the effects in memory of what have been done are visible to other threads.
Because each CPU has several levels of cache between it and main memory, threads running on different CPUs or cores can see "memory" differently at any given moment in time because threads are permitted to obtain and work on private copies of main memory.
Using prevents any other thread from obtaining the monitor (or lock) , thereby preventing any and all code protected by synchronization on the same object from ever executing concurrently.
Importantly, synchronization creates a "happens-before" memory barrier, causing a memory visibility constraint such that anything that is done after some thread acquires a lock to another thread subsequently acquiring to have happened before that other thread acquired the lock.
In practical terms, on current hardware, this typically causes flushing of the CPU caches when a monitor is acquired and writes to main memory when it is released, both of which are expensive (relatively speaking).
Using , on the other hand, forces all accesses (read or write) to the volatile variable to occur to main memory, effectively keeping the volatile variable out of CPU caches.
This can be useful for some actions where it is simply required that visibility of the variable is correct and order of accesses is not important.
Using also changes treatment of and to require accesses to them to be atomic; on some (older) hardware this might require locks, though not on modern 64 bit hardware.