Okay.
The activation function you're suggesting, first started with the Perceptron, in 1952. Before that there were only McCulloch-Pitts neurons, which had no activation function at all.
The function you're suggesting is an f(x), which as you say, is deterministic. The reason there's an activation function at all, is to make the neuron's response nonlinear. Because, linear classification is rather boring.
Sometimes around 1972, Shun-Ichi Amari introduced the idea of a stochastic activation function. Instead of f(x), it is P(f(x)), where P is a probability. He did this to make the network "more biological", to align more closely with real behavior. Because, this is the way ion channels behave in real life. There is only a "probability" that they will let the ion through.
The genius of Hopfield in the early 80's, was the realization that population behavior is much more important than single neuron behavior. Basically neurons classify patterns into subspaces of the total state space. If the neurons are linear, so are the subspaces. If the neurons are probabilistic, so are the subspaces.
Hopfield DELIBERATELY used a static transfer function to demonstrate the importance of asynchronous updating. His original network is highly non biological, and simple to the point of ridiculousness, yet he was able to derive an enormous amount of computational power from it, so much so that the entire world took notice. Only a year later physicists at UPenn had built an optical version of his network, that was able to solve Traveling Salesman problems in under a second. This was astounding not only to physicists, but to mathematicians, biologists, and computer scientists alike.
Today, such power is commonplace. Amari who introduced the stochastic transfer function, is now considered the godfather of Information Geometry. Whereas a Hopfield network can memorize and approximate any continuous function, an Amari network can memorize and approximate any probability distribution. That's quite an achievement, if you're familiar with Brownian motion and Wiener processes.
Today there is focus on extracting the Volterra kernels from discontinuous nonlinear time series. This would have been completely impossible with deterministic networks. In real life this translates into "how to confuse an AI". If you say "I take my coffee with cream and sugar", most networks can figure out what you're saying. But if you say "I take my coffee with cream and.... (delay).... (cough, stutter).... um.... dog", most networks will become horribly confused.
However an Amari network will gracefully respond with "that makes no sense, would you mind repeating that please". It has enough smarts to recognize that the probability of drinking a dog is near zero -;whereas a linear classifier will try to show you a picture of a white dog.
Take a quick look at these slides for an intuitive understanding:
yosinski.com