Neural Networks: It’s Not Brain Surgery (But It’s Close!)

In the booming digital era, AI has become a household name, permeating every aspect of life and business. From self-driving cars and Netflix's recommendation systems to large language models like ChatGPT, all of them conceal a powerful and complex technological foundation: Neural Networks. So, what exactly are neural networks, and why do they play such a pivotal role in the remarkable advancement of modern artificial intelligence?

At CoderPush, we believe that a deep understanding of fundamental principles is key to building groundbreaking AI solutions. In a recent special Tech Talk #113, Neo Nguyen, one of our exceptional AI native engineers, took the audience on a journey through the world of neural networks, from their biological inspiration to their practical applications in the most advanced AI systems.

The Human Brain as the Ultimate Inspiration: From Biology to AI

The history of neural network development is intertwined with humanity's effort to simulate the structure and function of the biological brain. The human brain, with its extraordinary ability to learn and connect from past experiences, has been an endless source of inspiration for computer scientists.

Neo Nguyen shared his inspiring journey, shaped by Martin T. Hagan - author of "Neural Network Design," a classic textbook taught at many top universities in the US. This book, along with Dr. Hagan's unique teaching style laid a solid foundation for Neo's career in Computer Science and AI.

To truly understand neural networks, we need to go back to the basics of biology. Let's look at the structure of a nerve cell (neuron) - the basic unit that makes up our brain:

Dendrites: These are the neuron's "listeners," receiving signals (inputs) from other neurons. In the context of AI, these are essentially the input data.
Cell Body: This part acts as a central processing unit, aggregating and processing the received signals.
Axon: After processing, the axon transmits signals (outputs) to other neurons. This is the neuron's "speaker."
Synapse: This is the connection point between neurons. The strength of the synapse determines the strength of the connection and the ability to process information. A strong synapse means information is transmitted effectively, and the learning process occurs better.

To make it easier to visualize, Neo offered an interesting analogy:

“Imagine a neural network as an orchestra.”

The musician is the dendrite – listening to the conductor's instructions. The cell body is the brain processing the music, reading notes, and making decisions. The axon is the instrument, producing sound (output). And the synapse is the sound system, ensuring the audience can clearly hear the music. This analogy helps us intuitively grasp how these components work together to process information.

From Biological Neurons to Mathematical Models: The Core of Neural Networks

Once the biological principles were understood, scientists began to mathematically model the nerve cell. This was a crucial transformation, laying the groundwork for building artificial neural networks.

The basic formula for a single neuron can be expressed as:

Simplified diagram of a neuron, illustrating its fundamental structure as AI's biological inspiration.

Where:

P is the Input - the signal received by the dendrites.
A is the Output.
F is the Function (activation function) - the axon's role, determining how information is transmitted.
W is the Weight - representing the strength of the synapse, influencing the signal's intensity.
B is the Bias - a parameter that helps adjust the neuron's activation threshold.

Activation functions (F) play a pivotal role in shaping how a neuron processes and makes decisions. Neo introduced three common activation functions:

Hard Limit Function: This function returns only 0 or 1, similar to "yes" or "no" decisions. For example, in binary classification problems.
Linear Function: The output is identical to the input, often used where the input-output relationship is a simple linear one.
Log-sigmoid Function: This function squashes the output into a range between 0 and 1, often used to represent probabilities. For instance, an image recognition AI might use this function to output an 80% probability that an image is of a dog.

Building Complexity: From Single Neuron to Neural Network Architectures

The human brain contains approximately 86 billion to 100 billion neurons, working together in a complex network. Similarly, an artificial neural network is built by connecting many individual neurons into multiple layers:

Input Layer: This layer receives the raw input data.
Hidden Layers: This is where complex processing takes place, extracting features and hidden relationships from the data. A network with more hidden layers is considered "deep," leading to the term Deep Learning.
Output Layer: This layer provides the final result of the processing.

Imagine a neural network as an information factory: raw data enters the first layer, undergoes complex "processing stages" in the hidden layers, and finally yields the finished product in the output layer.

A practical example of how a neural network works is the apple and orange classification problem. Suppose we have three characteristics to distinguish them: shape (round/oval), skin texture (smooth/rough), and weight (heavy/light). The neural network will be "trained" based on data from these characteristics, then automatically adjust its weights and biases to find the most effective "boundary" to separate the two fruit types.

The learning process of a neural network involves the following steps:

Receive Data: Input data is fed through the network.
Adjust Weights: Weights and biases are adjusted based on the difference between the predicted and actual results.
Calculate Error: The network calculates the "wrongness" of its prediction (error) and seeks to minimize this error.
Make Prediction: Finally, the network outputs its prediction.

Neural Networks in Action: Powering Modern AI Innovations

The knowledge of neural networks is not just theoretical; it's the foundation for many groundbreaking AI applications we see today.

Recurrent Neural Networks (RNNs) in Language Models

Before ChatGPT, chatbot models often utilized Recurrent Neural Networks (RNNs). A key characteristic of RNNs is their ability to process sequential information, where the output of one time step is fed back as input for the next. This explains why older chatbots would "type" word by word, building their responses gradually. Despite limitations in providing a holistic view of the answer, RNNs laid crucial groundwork for developing language models.

The Rise of Transformers: ChatGPT's Breakthrough

The landscape of natural language processing underwent a profound transformation with the introduction of Transformers. This revolutionary architecture is the backbone of modern Large Language Models (LLMs), most famously embodied by ChatGPT. Transformers fundamentally address the limitations of RNNs by abandoning the sequential processing paradigm in favor of parallel processing.

As Neo precisely detailed, the key differentiator with Transformers, and by extension ChatGPT, is their ability to process an entire input sequence simultaneously. Instead of generating one word at a time and feeding it back, Transformers can analyze all the words in your query at once.

Parallel Processing in Action: If you ask ChatGPT, "Who is the current President of the USA?", the Transformer architecture doesn't just process "Who," then "is," then "the," etc. Instead, it concurrently analyzes "Who," "is," "the," "current," "President," "of," "the," "USA," and "?". This parallel analysis allows it to instantly grasp the entire context, the relationships between all the words, and the nuances of the query.
Improved Coherence and Intelligence: Because the entire input is "processed at once" (or more accurately, processed in parallel with mechanisms like self-attention that weigh the importance of different words in context), the model can construct a far more coherent, relevant, and sophisticated response before it even begins to output the first word. This is why ChatGPT's answers feel remarkably human-like and intelligent – it has already "thought through" the entire response.

Neo went on to provide a concise yet profound explanation of ChatGPT's operational workflow, a process readily available in its public documentation:

Step 1: Supervised Fine-tuning (The "Copying Student")

This initial phase is akin to a student meticulously learning from a teacher. ChatGPT is trained on an enormous dataset consisting of countless text prompts paired with human-written, high-quality answers.
The model's primary task here is to mimic these examples. It learns to recognize patterns in prompts and generate responses that are structurally and stylistically similar to the human-provided answers. Neo described this simply: "The teacher says something, the student copies it exactly. This stage does nothing else; it just fine-tunes. It's like a student going to school, copying from a sample model." This step builds the core linguistic and factual knowledge.

Step 2: Reward Model Training (Human-Guided Refinement)

This is the critical step that significantly differentiates ChatGPT from its predecessors and contributes to its success. After supervised fine-tuning, the model is prompted to generate multiple different responses (e.g., 3-4 variations) to the same question.
Crucially, human evaluators (experts hired by OpenAI) then meticulously rank these generated responses from best to worst. They assess clarity, accuracy, relevance, and overall quality.
This human feedback is used to train a separate "reward model." This model learns to predict which responses humans would prefer, assigning a "reward score" to potential answers. If, for instance, four answers (A, B, C, D) are generated, humans might rank them D (best), then C, then A, then B. The reward model learns to replicate this ranking, essentially understanding what constitutes a "good" answer from a human perspective.

Step 3: Reinforcement Learning (Self-Improvement Loop):

With the reward model in place, the final step involves the ChatGPT model iteratively refining itself through reinforcement learning from human feedback (RLHF).
The model generates responses, and the reward model (not humans directly in this phase) evaluates them, assigning a score.
If a response receives a high reward score, the model adjusts its internal parameters to reinforce the behaviors that led to that good answer. Conversely, low reward scores prompt the model to adjust its parameters to avoid similar poor responses in the future.
As Neo summarized, "It just learns. It answers, then it goes back and learns again. If it answers well, it gets a high reward score. If it answers poorly, it gets a low reward score, and it improves for the next time." This continuous feedback loop of generating, evaluating, and self-improving is what allows ChatGPT to achieve such remarkable levels of sophistication and human-like interaction.

This intricate, multi-stage training process, particularly the integration of human preference feedback and reinforcement learning, explains why models like ChatGPT exhibit such advanced conversational abilities and nuanced understanding compared to earlier, simpler chatbot architectures.

Neural Networks' Position in the AI Ecosystem

So, where do neural networks fit into the broader landscape of the AI ecosystem?

The AI ecosystem is a layered structure, where each layer plays an essential role:

Artificial Intelligence (AI): This is the outermost and most encompassing layer, referring to machines' ability to perform tasks requiring human intelligence. AI is the application layer, where technologies are deployed to solve real-world problems.
Machine Learning (ML): Nested within AI, Machine Learning is a set of algorithms that allow computers to "learn" from data without being explicitly programmed. These are the methods for teaching computers how to learn.
Neural Networks: Neural networks are the "heart" of Machine Learning, especially in complex applications. They are the structures where the actual learning process occurs.
Deep Learning: A subfield of neural networks, Deep Learning refers to neural networks with many hidden layers, allowing them to process vast amounts of data and learn more complex features.

The explosion of AI and Deep Learning in recent years isn't because the algorithms are new. In fact, many neural network algorithms existed as far back as the 1910s. The crucial factor making the difference is the remarkable advancement in hardware. The powerful processing capabilities of Graphics Processing Units (GPUs) and the advent of specialized chips have made it possible to train Deep Learning models with enormous amounts of data and in much less time. This is why chip manufacturers like Nvidia have become key players in the global AI race.

Partner with CoderPush for Your AI Journey

At CoderPush, we are proud to have a team of AI native engineers like Neo Nguyen, who not only possess deep theoretical knowledge but also practical experience in developing and deploying cutting-edge AI solutions. Our profound understanding of neural networks, from basic principles to complex architectures like Transformers, enables us to design and build powerful AI systems that precisely meet your business needs and deliver tangible value.

We are committed to providing high-quality, data-driven, and reliably sourced insights, offering our partners and potential clients a clear understanding of CoderPush's superior capabilities and experience in the industry. Whether you are a startup founder seeking groundbreaking AI solutions, a software developer looking to advance your knowledge, or a CEO/CTO shaping your future technology strategy, we are ready to partner with you.

Are you looking for a partner capable of transforming your AI ideas into reality?

Contact CoderPush today to discover how we can help your business lead the AI revolution. With a team of experienced AI native engineers and a deep understanding of neural networks, we are confident in delivering innovative solutions that will help you optimize operations, enhance efficiency, and create a sustainable competitive advantage.

Remote Engineer