Yantra-Śikṣaṇa-Sūtram

The intersection of artificial intelligence and classical Indian linguistics offers a profound epistemological framework for analyzing complex computational architectures. Classical Sanskrit, engineered with algorithmic precision by grammarians such as Pāṇini, operates on a highly structured system of foundational roots (dhātu) and morphological rules (sūtra) that closely mirrors the logic of modern programming languages, state machines, and machine learning systems.1 The ensuing treatise undertakes a comprehensive codification of modern neural network architectures into Sanskrit verse, functioning as a traditional Śāstra (scientific treatise) complete with an auto-commentary (Svopajña-bhāṣya).
Relying upon foundational literature detailing the programmatic construction of neural networks 3 and standardized linguistic frameworks established by the Commission for Scientific and Technical Terminology (CSTT) 4, the analysis distills the mathematical, programmatic, and theoretical essence of deep learning into a structured, metrical format. The treatise maps out the core concepts, filters transient programmatic artifacts, and executes a rigorous versification governed by Pāṇinian morphology.

Structural Mapping and Methodological Filtration

The source literature delineating neural network architecture from fundamental principles contains vast amounts of programmatic syntax, environment configuration instructions, repetitive array printouts, and anecdotal text regarding dataset origins.3 To translate this corpus into a classical Sūtram or Kārikā (verse) format, the material undergoes a systematic filtration process.
The transient artifacts deemed fit for skipping include Python package installations, object-oriented programming boilerplate logic (such as class initialization and self references), Matplotlib visualization parameters, and the repetitive iterations of singular concepts using basic lists prior to the introduction of optimized matrix operations.3
The core elements deemed fit for keeping and codification encompass the mathematical definitions of dense layers (dot products and matrix multiplication), non-linear transformations via activation functions, error quantification paradigms (loss functions), the calculus of backpropagation (partial derivatives and the chain rule), optimization mechanics, generalization techniques, and out-of-sample evaluation paradigms.3
The treatise is organized into seven specific chapters (Prakaraṇas). The verse counts are meticulously estimated based on the density of the mathematical concepts required for strict codification.

Prakaraṇa (Chapter)	Focus of Codification	Estimated Verses	Selected Meter (Chhandas)	Rationale for Meter Selection
I. Jñānatantu-Racanā	Forward pass, weights, biases, dot products, and batch processing.3	2	Anuṣṭubh (8 syllables/quarter)	Ideal for establishing foundational axioms and structural definitions.
II. Udīpanakarma	ReLU, Softmax, Sigmoid, and Linear activation functions.3	3	Upajāti (11 syllables/quarter)	Provides the rhythmic expansion necessary for mapping non-linear mathematical nuances.
III. Doṣamāna-Nirṇaya	Categorical Cross-Entropy, Binary Cross-Entropy, and Mean Squared Error.3	3	Vasantatilakā (14 syllables/quarter)	Suited for precise calculations and multi-variable equations.
IV. Pratiprasāraṇam	Derivatives, partial derivatives, the Chain Rule, and Backpropagation.3	3	Śārdūlavikrīḍita (19 syllables/quarter)	Capable of holding dense algorithmic steps and recursive gradient flows.
V. Iṣṭatamīkaraṇam	Gradient Descent, Learning Rates, Momentum, and the Adam Optimizer.3	3	Mandākrāntā (17 syllables/quarter)	Mirrors the slow, deliberate, step-wise descent across a loss topology.
VI. Niyamīkaraṇam	L1/L2 Regularization and Dropout layers.3	2	Anuṣṭubh (8 syllables/quarter)	Clear, condition-based rules for penalization and neural omission.
VII. Parīkṣaṇa	Validation, generalization, shuffling, and accuracy evaluation.3	2	Śālinī (11 syllables/quarter)	Provides a conclusive, rhythmic cadence for terminal validation metrics.

Lexical Framework and Dhātu-Pātha Derivations

The translation of machine learning terminology into Sanskrit demands precise morphological derivations to reflect computational mechanics.7 The derivations rely heavily on the Dhātu-Pāṭha (the authoritative list of verbal roots), ensuring adherence to Pāṇinian grammar.2 The treatise occasionally employs metrical synonyms - alternative words with different syllable counts representing the same concept - to satisfy the strict constraints of Sanskrit prosody.

English Term	Primary Sanskrit Term	Metrical Alternative	Dhātu (Root) & Morphological Derivation	Contextual Justification
Neural Network	Yantra-nāḍī-vyūha	Jñānatantu-jāla	yantra (machine) + nāḍī (nerve) + vyūha (matrix)	Represents an artificial structure resembling biological neural pathways.10
Neuron	Jñānakośa	Kalaśa	√jñā (to know) + kośa (cell/receptacle)	Acts as the fundamental unit holding and transmitting an activation state.8
Weight	Bhāra	Guru	√bhṛ (to bear/support) + ghañ	Represents the magnitude of a connection, scaling the input.12
Bias	Pakṣapāta	Nati	√nam (to bend) + ktin	Represents the constant offset shifting the activation threshold.14
Activation Function	Udīpanavidhi	Bodhanakriyā	ud-√īp (to ignite) + vidhi (rule)	Denotes the mathematical threshold determining if a neuron fires.15
Loss Function	Doṣamāna	Hāni	√duṣ (to be corrupted) + māna (measure)	Quantifies the error or divergence from the true target value.17
Gradient Descent	Avarohaṇa-mārga	Kṣayamārga	ava-√ruh (to descend) + mārga (path)	Describes the step-wise descent across a loss topology.18
Derivative	Avakalaja	Bhedaka	ava-√kal (to break down/calculate) + ja (born of)	Reflects the instantaneous rate of change in calculus.20
Backpropagation	Pratiprasāraṇa	Paścādgati	prati (back) + pra-√sṛ (to flow) + lyuṭ	Describes the backward flowing of error gradients.6
Optimization	Iṣṭatamīkaraṇa	Sādhana	iṣṭa (desired) + tama (most) + karaṇa (making)	The mathematical process of finding the optimal parameter state.23

Prakaraṇa I: Jñānatantu-Racanā (Architecture of the Neuron)

The fundamental operation of a dense neural network layer involves accepting inputs, scaling them by their respective weights, and aggregating the result with a bias term.3

Verse 1 - Anuṣṭubh

क्रमाद्-दत्तानि भारैश्-च संगुण्याथापि यत्नतः
नतिं चैव समायोज्य नाडीकोशः प्रवर्तते ॥ १ ॥

IAST transliteration

kramād-dattāni bhāraiś-ca saṃguṇyāthāpi yatnataḥ natiṃ caiva samāyojya nāḍīkośaḥ pravartate || 1 ||

Padaccheda:
kramāt (sequentially) dattāni (given inputs) bhāraiḥ (by weights) ca (and) saṃguṇya (having multiplied) athāpi (and then) yatnataḥ (carefully) |
natiṃ (bias/offset) ca (and) eva (indeed) samāyojya (having added) nāḍīkośaḥ (the neuron) pravartate (functions/activates) ||
Vyākaraṇa (Morphological Derivation):

saṃguṇya: Prefix sam + Root √guṇ (Tenth conjugation, Curādi, to multiply) + lyap (gerund suffix). Denotes an action completed prior to the subsequent action.
samāyojya: Prefix sam + ā + Root √yuj (Tenth conjugation, to join/add) + lyap.
pravartate: Prefix pra + Root √vṛt (First conjugation, Bhavādi, to function/operate) + laṭ (present tense), third-person singular, ātmanepada.
natiṃ: Accusative singular of nati (bias), derived from √nam (to bend) + ktin.

Mathematical Equivalency and Commentary: The equation codified here is the basic linear combination: Output = $\sum (inputs \times weights) + bias$. In the computational paradigm, a single neuron takes a vector of features, scales each feature by a trainable weight parameter, aggregates the sum, and shifts the result by a trainable bias parameter.3 The verse perfectly aligns with the raw programmatic logic output = (inputs*weights + inputs*weights + ... + bias) prior to the introduction of array mathematics. The bias (nati) is crucial as it offsets the overall function vertically, allowing the network to map dynamics that do not naturally pass through the origin.3

Verse 2 - Anuṣṭubh

समूहे यदि दत्तानि व्यूहास्-ते भारसंयुताः
बिन्दु-घातेन सर्वाणि फलानि प्राप्नुवन्ति हि ॥ २ ॥

IAST transliteration

samūhe yadi dattāni vyūhās-te bhārasaṃyutāḥ bindu-ghātena sarvāṇi phalāni prāpnuvanti hi || 2 ||

Padaccheda:
samūhe (in a batch) yadi (if) dattāni (inputs) vyūhāḥ (matrices) te (they) bhārasaṃyutāḥ (joined with weights) |
bindu-ghātena (by the dot product) sarvāṇi (all) phalāni (results) prāpnuvanti (obtain) hi (certainly) ||
Vyākaraṇa:

bindu-ghātena: Instrumental singular of bindughāta. bindu (dot) + ghāta (product/strike, from √han + ghañ). A repurposed technical term for the mathematical dot product.
bhārasaṃyutāḥ: Nominative plural, Bahuvrīhi compound. bhāra + sam + √yu + kta. Matrices endowed with weights.
prāpnuvanti: Prefix pra + Root √āp (Fifth conjugation, Svādi, to attain) + laṭ, third-person plural, parasmaipada.

Mathematical Equivalency and Commentary: The equation transitions to matrix notation: Layer Outputs = $\mathbf{X} \cdot \mathbf{W}^T + \mathbf{b}$. When inputs are presented in a batch (samūhe), they form a two-dimensional matrix.3 The weights are similarly structured as a matrix. To process multiple samples simultaneously and leverage parallel computing, the matrix dot product (bindughāta) is utilized. The transposition of the weight matrix is necessary for dimensional alignment during the dot product operation, codified programmatically as np.dot(inputs, np.array(weights).T) + biases.3 Fitting data in batches improves generalization by providing a higher statistical probability of making meaningful changes to weights during optimization, preventing the network from oscillating endlessly around individual sample anomalies.3

Prakaraṇa II: Udīpanakarma (Activation Functions)

Dense layers exclusively perform linear transformations. To map complex, non-linear problems, the outputs of these layers must pass through non-linear activation functions (Udīpanavidhi).3

Verse 3 - Upajāti

शून्याद्-गरिष्ठं यदि मानम्-अस्ति तदेव मानं सततं विधेयम्
शून्याद्-अल्पिष्ठं यदि चापि मानं शून्यं भवेन्-नाडिक-बोधनाय ॥ ३ ॥

IAST transliteration

śūnyād-gariṣṭhaṃ yadi mānam-asti tadeva mānaṃ satataṃ vidheyam śūnyād-alpiṣṭhaṃ yadi cāpi mānaṃ śūnyaṃ bhaven-nāḍika-bodhanāya || 3 ||

Padaccheda:
śūnyāt (than zero) gariṣṭhaṃ (greater) yadi (if) mānam (value) asti (is), tat (that) eva (itself) mānaṃ (value) satataṃ (always) vidheyam (should be applied) |
śūnyāt (than zero) alpiṣṭhaṃ (lesser) yadi (if) ca (and) api (also) mānaṃ (value), śūnyaṃ (zero) bhavet (should become) nāḍika-bodhanāya (for the awakening of the neuron) ||
Vyākaraṇa:

gariṣṭhaṃ / alpiṣṭhaṃ: Nominative singular neuter. Superlative suffixes (iṣṭhan) applied to guru (heavy/great) and alpa (small).
vidheyam: Nominative singular neuter. Prefix vi + Root √dhā (Third conjugation, Juhotyādi, to put/apply) + yat (potential passive participle/kṛtya affix).
bhavet: Root √bhū (First conjugation, to be) + liṅ (optative mood), third-person singular.
bodhanāya: Dative singular of bodhana (awakening/activation), denoting purpose.

Mathematical Equivalency and Commentary: The equation codified is the ReLU activation function: $y = \max(0, x)$. The verse strictly dictates the piecewise logic: if the linear output is greater than zero, the exact value is retained. If the value is less than zero, the signal is zeroed out (clipped). This is programmatically executed via np.maximum(0, inputs).3 The ReLU function introduces the essential non-linearity required to fit dynamic shapes (such as sine waves) by allowing layers of neurons to create specific “areas of effect” where certain neurons are active while others remain silent.3 Without non-linear activation, regardless of the number of hidden layers, the entire neural network would collapse mathematically into a single linear matrix multiplication.3

Verse 4 - Upajāti

घाताङ्क-रूपाणि फलानि कृत्वा सर्वैः समाहारफलैश्-च भक्ताः
विभागयोगेन हि लभ्यते या संभाव्यता कोमलतीक्ष्ण-माने ॥ ४ ॥

IAST transliteration

ghātāṅka-rūpāṇi phalāni kṛtvā sarvaiḥ samāhāraphalaiś-ca bhaktāḥ vibhāgayogena hi labhyate yā saṃbhāvyatā komalatīkṣṇa-māne || 4 ||

Padaccheda:
ghātāṅka-rūpāṇi (in the form of exponents) phalāni (outputs) kṛtvā (having made), sarvaiḥ (by all) samāhāraphalaiḥ (by the sum of outputs) ca (and) bhaktāḥ (divided) |
vibhāgayogena (by the method of distribution) hi (indeed) labhyate (is obtained) yā (which) saṃbhāvyatā (probability) komalatīkṣṇa-māne (in the Softmax computation) ||
Vyākaraṇa:

ghātāṅka: ghāta (power) + aṅka (number) referring to Euler’s number ($e$).
bhaktāḥ: Nominative plural masculine. Root √bhaj (First conjugation, to divide/apportion) + kta (past passive participle).
saṃbhāvyatā: Nominative singular feminine. Root sam + √bhū + ṇyat + tal. Defines probability or likelihood.
komalatīkṣṇa: komala (soft) + tīkṣṇa (max/sharp). A direct morphological calque designed to represent “Softmax.” Metrical alternative used here to satisfy the 11-syllable constraint.

Mathematical Equivalency and Commentary: The equation codified is the Softmax normalization: $S_{i,j} = \frac{e^{z_{i,j}}}{\sum_{l=1}^{L} e^{z_{i,l}}}$. The verse dictates that by exponentiating the uncalibrated raw outputs of the preceding layer and dividing each by the sum of all exponentiated outputs in that sample, a normalized probability distribution is formed.3 Exponentiation using Euler’s number ($e \approx 2.718$) ensures that all negative values become positive, which is a mathematical prerequisite for a valid probability distribution.3 It also ensures that the function remains monotonic, preserving the rank order of the predictions. Programmatically, numerical instability (exploding floating-point numbers) is avoided by subtracting the maximum value from the inputs prior to exponentiation: exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True)).3 The resulting distribution sums to exactly 1.0, representing the network’s confidence scores across multiple mutually exclusive classes.

Verse 5 - Upajāti

द्वन्द्वात्मिका यदि भवेन्-मतिश्-च स्याद्-वक्र-रेखा गणिते तु योग्या
निर्बाध-रूपेण फलं च यत्र ऋज्वी-गतिः सूचयति प्रमाणम् ॥ ५ ॥

IAST transliteration

dvandvātmikā yadi bhaven-matiś-ca syād-vakra-rekhā gaṇite tu yogyā nirbādha-rūpeṇa phalaṃ ca yatra ṛjvī-gatiḥ sūcayati pramāṇam || 5 ||

Padaccheda:
dvandvātmikā (binary/dual-natured) yadi (if) bhavet (should be) matiḥ (the decision/classification) ca (and) syāt (should be) vakra-rekhā (the curved line/sigmoid) gaṇite (in the calculation) tu (indeed) yogyā (appropriate) |
nirbādha-rūpeṇa (in an unobstructed form) phalaṃ (the result) ca (and) yatra (where) ṛjvī-gatiḥ (the straight path/linear activation) sūcayati (indicates) pramāṇam (the continuous measure) ||
Vyākaraṇa:

dvandvātmikā: dvandva (pair/binary) + ātman (nature) + ṭāp. Represents binary logistic regression.
vakra-rekhā: vakra (curved/S-shaped) + rekhā (line). Represents the Sigmoid curve.
ṛjvī-gatiḥ: ṛju (straight/linear) + ṅīp + gati (path). Represents the Linear activation function.
sūcayati: Root √sūc (Tenth conjugation, to indicate/point out) + laṭ, third-person singular.

Mathematical Equivalency and Commentary: For binary classification, the architecture diverges from Softmax and utilizes the Sigmoid activation function: $y = \frac{1}{1 + e^{-x}}$. The Sigmoid function mathematically confines the output strictly between the boundaries of 0 and 1, providing an independent probability threshold for a single output neuron.3 Conversely, when the neural network is tasked with predicting continuous numerical values (regression analysis), such as localized environmental metrics or financial pricing models, the output layer requires a Linear Activation Function (ṛjvī-gati). The Linear Activation function allows the scalar output of the dense layer to pass through completely unobstructed ($y = x$).3

Prakaraṇa III: Doṣamāna-Nirṇaya (Calculus of Loss)

A neural network must quantify its error to understand how to adjust its parameters. The loss function compares the predicted outputs against the ground-truth targets.3

Verse 6 - Vasantatilakā

सत्यं हि मानं लघुगणकेन युक्तं, भिन्नेन दृष्टम्-अतुलं च तथा ऋणेन
लक्ष्यानुसारं प्रपतत्य्-अदोषो, हानेः प्रमाणं प्रकटीकरोति हि ॥ ६ ॥

IAST transliteration

satyaṃ hi mānaṃ laghugaṇakena yuktaṃ, bhinnena dṛṣṭam-atulaṃ ca tathā ṛṇena lakṣyānusāraṃ prapataty-adoṣo, hāneḥ pramāṇaṃ prakaṭīkaroti hi || 6 ||

Padaccheda:
satyaṃ (true) hi (indeed) mānaṃ (value) laghugaṇakena (with the logarithm) yuktaṃ (joined/multiplied), bhinnena (with the predicted fraction) dṛṣṭam (seen/calculated) atulaṃ (incomparable) ca (and) tathā (thus) ṛṇena (with the negative sign) |
lakṣyānusāraṃ (according to the target) prapatati (descends) adoṣo (the error), hāneḥ (of the loss) pramāṇaṃ (measure) prakaṭīkaroti (reveals) hi (certainly) ||
Vyākaraṇa:

laghugaṇakena: Instrumental singular. laghu (small) + gaṇaka (calculator). A technical adaptation designed to represent the natural logarithm ($\ln$).
ṛṇena: Instrumental of ṛṇa (negative/debt), denoting multiplication by the scalar -1.
prakaṭīkaroti: prakaṭa (manifest) + cvi affix + √kṛ (to make). To make manifest or reveal.

Mathematical Equivalency and Commentary: The equation codified is the Categorical Cross-Entropy loss formula: $L_i = - \sum y_{i,j} \ln(\hat{y}_{i,j})$. The verse delineates that by multiplying the true target value (often presented as a one-hot encoded vector where the target class is 1 and all others are 0) by the natural logarithm of the predicted confidence score, and applying a negative scalar, the definitive measure of loss is quantified.3 Because the target vector is populated predominantly with zeros, the programmatic implementation bypasses the full summation. It utilizes array indexing to extract only the predicted confidence score at the index of the true class, computing the negative log exclusively on that value: -np.log(correct_confidences).3 To prevent the logarithmic function from attempting to calculate $\ln(0)$ - which mathematically approaches negative infinity and causes critical software failures - the raw prediction arrays are strictly clipped to a range of $1 \times 10^{-7}$ to $1 - 1 \times 10^{-7}$.3

Verse 7 - Vasantatilakā

द्वन्द्वे विकल्पे द्विविधं हि मानं, सत्येन युक्तं त्व्-अनृतेन चापि
मध्ये वियोगं लघुगणकेन, भागं प्रदर्श्य गणितं दिशति प्रमाणम् ॥ ७ ॥

IAST transliteration

dvandve vikalpe dvividhaṃ hi mānaṃ, satyena yuktaṃ tv-anṛtena cāpi madhye viyogaṃ laghugaṇakena, bhāgaṃ pradarśya gaṇitaṃ diśati pramāṇam || 7 ||

Padaccheda:
dvandve (in binary) vikalpe (choice) dvividhaṃ (two-fold) hi (indeed) mānaṃ (measure), satyena (with the true) yuktaṃ (joined) tu (but) anṛtena (with the false/inverse) ca (and) api (also) |
madhye (in the middle) viyogaṃ (separation/subtraction) laghugaṇakena (with the logarithm), bhāgaṃ (fraction) pradarśya (having shown) gaṇitaṃ (the mathematics) diśati (points out) pramāṇam (the measure) ||
Vyākaraṇa:

dvividhaṃ: dvi (two) + vidhā (kind/way). Represents the calculation across both the $y$ and $(1-y)$ paradigms.
anṛtena: Instrumental singular of anṛta (false/not true). Represents the inverse target $(1 - y_{true})$.
viyogaṃ: Prefix vi + Root √yuj + ghañ. Separation or subtraction.

Mathematical Equivalency and Commentary: For binary logistic regression, the loss function evaluates both the likelihood of the true class and the inverse likelihood. The equation is: $L = - \frac{1}{J} \sum \left( y_{true} \ln(y_{pred}) + (1 - y_{true}) \ln(1 - y_{pred}) \right)$. The programmatic calculation extracts the mean of these independent binary log-likelihoods along the terminal axis: np.mean(sample_losses, axis=-1).3

Verse 8 - Vasantatilakā

मानान्तरं यदि गरिष्ठम्-अनेन दृष्टं, वर्गीकृतं तद्-अखिलं त्व्-अपराध-मूलम्
मध्यं फलं गणयितुं प्रबलं प्रमाणं, ऋज्वी-गतौ गणित-शास्त्र-विदः वदन्ति ॥ ८ ॥

IAST transliteration

mānāntaraṃ yadi gariṣṭham-anena dṛṣṭaṃ, vargīkṛtaṃ tad-akhilaṃ tv-aparādha-mūlam madhyaṃ phalaṃ gaṇayituṃ prabalaṃ pramāṇaṃ, ṛjvī-gatau gaṇita-śāstra-vidaḥ vadanti || 8 ||

Padaccheda:
māna-antaraṃ (difference in value) yadi (if) gariṣṭham (heavy/severe) anena (by this) dṛṣṭaṃ (seen), vargīkṛtaṃ (squared) tat-akhilaṃ (all of that) tu (indeed) aparādha-mūlam (the root of the error) |
madhyaṃ (mean/average) phalaṃ (result) gaṇayituṃ (to calculate) prabalaṃ (strong) pramāṇaṃ (measure), ṛjvī-gatau (in continuous/linear regression) gaṇita-śāstra-vidaḥ (the knowers of mathematics) vadanti (state) ||
Vyākaraṇa:

māna-antaraṃ: māna (measure) + antara (difference). Denotes the subtraction $(y_{true} - y_{pred})$.
vargīkṛtaṃ: varga (square) + cvi + kṛtam. Having been made into a square.
gaṇayituṃ: Root √gaṇ + tumun (infinitive).
gaṇita-śāstra-vidaḥ: Nominative plural. Knowers (vid) of the science of mathematics.

Mathematical Equivalency and Commentary: In regression models utilizing linear activation, the Mean Squared Error (MSE) is employed. The equation is: $L = \frac{1}{J} \sum (y_{true} - y_{pred})^2$. By squaring the difference between the prediction and the ground truth (vargīkṛtaṃ), the MSE function exponentially penalizes large deviations, aggressively forcing the optimizer to correct severe prediction discrepancies.3

Prakaraṇa IV: Pratiprasāraṇam (Calculus and Backpropagation)

The essence of machine learning lies in determining the precise impact of each specific parameter on the final loss function. This necessitates the computation of partial derivatives (āṃśika-avakalaja) utilizing the Chain Rule, recursively flowing gradients backward from the terminal loss function to the initial inputs.3

Verse 9 - Śārdūlavikrīḍita

बाह्यस्यावकलजेन हि गुणितं यद्-अन्तरन्गम् फलं, शृङ्खला-नियमतः क्रमशः सर्वं समाकषति
भारस्यापि च पक्षपात-नतेः सत्यं प्रभावं तथा, निर्णीयाशु गणेन मूलपद-गं दोषं निरस्य स्फुटम् ॥ ९ ॥

IAST transliteration

bāhyasyāvakalajena hi guṇitaṃ yad-antarangam phalaṃ, śṛṅkhalā-niyamataḥ kramaśaḥ sarvaṃ samākaṣati bhārasyāpi ca pakṣapāta-nateḥ satyaṃ prabhāvaṃ tathā, nirṇīyāśu gaṇena mūlapada-gaṃ doṣaṃ nirasya sphuṭam || 9 ||

Padaccheda:
bāhyasya (of the outer) avakalajena (by the derivative) hi (indeed) guṇitaṃ (multiplied) yat-antarangam (that which is inner) phalaṃ (result),
śṛṅkhalā-niyamataḥ (by the chain rule) kramaśaḥ (sequentially) sarvaṃ (everything) samākaṣati (it extracts) |
bhārasya (of the weight) api (also) ca (and) pakṣapāta-nateḥ (of the bias) satyaṃ (true) prabhāvaṃ (impact) tathā (thus),
nirṇīya (having determined) āśu (swiftly) gaṇena (by the calculation) mūlapada-gaṃ (going to the root) doṣaṃ (error) nirasya (having cast away) sphuṭam (clearly) ||
Vyākaraṇa:

avakalajena: Instrumental of avakalaja (derivative).20 The derivative is the slope of the tangent line indicating the instantaneous rate of change.
śṛṅkhalā-niyamataḥ: Ablative generating suffix tasiL applied to śṛṅkhalā-niyama (chain rule).
samākaṣati: Prefix sam + ā + Root √kṛṣ (to pull/extract) + laṭ.
nirṇīya: Prefix nis + Root √nī (to lead/decide) + lyap.

Mathematical Equivalency and Commentary: The verse articulates the Chain Rule of calculus: $\frac{\partial L}{\partial x} = \frac{\partial f(g(x))}{\partial g(x)} \cdot \frac{\partial g(x)}{\partial x}$. To calculate the partial derivative of a chain of nested functions, the partial derivative of the outer function with respect to the inner function is multiplied by the partial derivative of the inner function with respect to its own parameters.3 Analytical derivatives, as opposed to numerical approximations, offer exact and computationally rapid solutions required for the multidimensional architecture of neural networks.3

Verse 10 - Śārdūlavikrīḍita

पश्चाद्गामि-सरेण यन्त्र-पथगे दोष-प्रमाणे सति, प्रत्येकं ज्ञानकोश-बन्धन-गतं दोषस्य भागं पृथक्
व्यूहानां परिवर्तितेन गुणितं घातेन कृत्वा तु तत्, सर्वाश्-चैव निरञ्जनाः प्रतिकृतीः संदर्शयत्य्-अञ्जसा ॥ १० ॥

IAST transliteration

paścādgāmi-sareṇa yantra-pathage doṣa-pramāṇe sati, pratyekaṃ jñānakośa-bandhana-gataṃ doṣasya bhāgaṃ pṛthak vyūhānāṃ parivartitena guṇitaṃ ghātena kṛtvā tu tat, sarvāś-caiva nirañjanāḥ pratikṛtīḥ saṃdarśayaty-añjasā || 10 ||

Padaccheda:
paścādgāmi-sareṇa (by the backward-flowing flow/backpropagation) yantra-pathage (going through the machine’s path) doṣa-pramāṇe (when the error measure) sati (is),
pratyekaṃ (each) jñānakośa-bandhana-gataṃ (residing in the neuron’s connection) doṣasya (of the error) bhāgaṃ (fraction/gradient) pṛthak (separately) |
vyūhānāṃ (of the matrices) parivartitena (with the transposed) guṇitaṃ (multiplied) ghātena (by the dot product) kṛtvā (having done) tu (indeed) tat (that),
sarvāḥ (all) ca (and) eva (indeed) nirañjanāḥ (spotless/corrected) pratikṛtīḥ (representations) saṃdarśayati (shows) añjasā (straightaway) ||
Vyākaraṇa:

paścādgāmi-sareṇa: Instrumental singular. paścāt (backward) + gāmin (going) + sara (flow). A direct Sanskrit translation for the “Backpropagation” algorithm.6
parivartitena: Instrumental of parivartita (transposed/turned around). In matrix calculus, transposing arrays is necessary to align dimensions for the backward dot product.
saṃdarśayati: Prefix sam + Root √dṛś (to see), causative stem (ṇic), laṭ third-person singular. Causes to see, reveals.

Mathematical Equivalency and Commentary: During backpropagation, the neural network calculates the gradient of the loss function with respect to the outputs, and propagates this gradient backward layer by layer.6 For a dense layer, the gradient with respect to the weights is calculated via np.dot(inputs.T, dvalues) where dvalues is the gradient received from the subsequent layer.3 Transposing the inputs (inputs.T) aligns the rows of the batch data with the columns of the incoming gradients. Similarly, the gradient to be passed backward to the previous layer is calculated via np.dot(dvalues, weights.T). Summing the gradients along the batch axis captures the partial derivative for the biases: np.sum(dvalues, axis=0, keepdims=True).3 Backpropagation, operating as chain rule plus memoization 6, allows the network to assess all parameters concurrently.

Verse 11 - Śārdūlavikrīḍita

संयुज्यापि च कोमलतीक्ष्ण-विधिना दोषस्य हानिं यदा, शीघ्रं सूत्र-फलं प्रदर्शयति तत् सत्यापवादेन हि
लक्ष्यात्-प्राप्तफलं वियोगम्-अनघं कृत्वा तु तत्-क्षणतो, सर्वान्-तन्तु-गणान् समीकरोति तद् गरिष्ठ-मानान्तरम् ॥ ११ ॥

IAST transliteration

saṃyujyāpi ca komalatīkṣṇa-vidhinā doṣasya hāniṃ yadā, śīghraṃ sūtra-phalaṃ pradarśayati tat satyāpavādena hi lakṣyāt-prāptaphalaṃ viyogam-anaghaṃ kṛtvā tu tat-kṣaṇato, sarvān-tantu-gaṇān samīkaroti tad gariṣṭha-mānāntaram || 11 ||

Padaccheda:
saṃyujya (having combined) api (also) ca (and) komalatīkṣṇa-vidhinā (with the Softmax rule) doṣasya (of the error) hāniṃ (loss) yadā (when),
śīghraṃ (rapidly) sūtra-phalaṃ (the formula’s result) pradarśayati (shows) tat (that) satya-apavādena (by the subtraction of the truth) hi (indeed) |
lakṣyāt (from the target) prāptaphalaṃ (the obtained result/prediction) viyogam (subtraction) anaghaṃ (faultless) kṛtvā (having done) tu (indeed) tat-kṣaṇataḥ (instantly),
sarvān (all) tantu-gaṇān (neuron groups) samīkaroti (equalizes) tat (that) gariṣṭha-māna-antaram (the great difference in measure) ||
Vyākaraṇa:

saṃyujya: Gerund denoting the mathematical synthesis of the two functions into a single step.
satya-apavādena: satya (truth/ground-truth) + apavāda (exception/subtraction).
samīkaroti: samī (cvi affix indicating making something equal that was not) + √kṛ.

Mathematical Equivalency and Commentary: Calculating the partial derivatives of the Softmax activation and the Categorical Cross-Entropy loss functions independently is computationally expensive, requiring the calculation of a full Jacobian matrix.3 However, when the algebraic derivatives of both functions are combined via the chain rule, the vast majority of the complex exponential mathematics cancels out. The resulting combined gradient equation remarkably simplifies to $\hat{y}{i,k} - y{i,k}$, which is merely the predicted probability subtracted by the true target value.3 Programmatically, this is executed by extracting the one-hot encoded true index and simply subtracting 1 from the predicted probability at that index: dinputs[range(samples), y_true] -= 1, followed by normalizing the gradient by dividing by the number of batch samples.3 This elegant simplification accelerates backward pass computations roughly seven-fold.3

Prakaraṇa V: Iṣṭatamīkaraṇam (Optimization Mechanics)

Gradients dictate the direction of steepest ascent. To minimize loss, the parameters are adjusted by traveling in the exact opposite direction - descending the gradient topography. This process is Optimization (Iṣṭatamīkaraṇa).3

Verse 12 - Mandākrāntā

शिक्षागतिं दोष-नतिम् प्रगृह्य, प्रत्येक-भाराद् अपनीय नित्यम्
शनैः शनैर्-नीचतलं प्रयाति, तमोऽपहत्यै गणितस्य मार्गात्
एतद्-धि सत्यं क्षयमार्ग-यानम्, अविक्षितं बिन्दुगति-प्रकाशम्
वियुक्तम्-एकैकपदेन नित्यं, दोषापहारं कुरुते हि यन्त्रम् ॥ १२ ॥

IAST transliteration

śikṣāgatiṃ doṣa-natim pragṛhya, pratyeka-bhārād apanīya nityam śanaiḥ śanair-nīcatalaṃ prayāti, tamo'pahatyai gaṇitasya mārgāt etad-dhi satyaṃ kṣayamārga-yānam, avikṣitaṃ bindugati-prakāśam viyuktam-ekaikapadena nityaṃ, doṣāpahāraṃ kurute hi yantram || 12 ||

Padaccheda:
śikṣā-gatiṃ (learning rate) doṣa-natiṃ (gradient of the error) pragṛhya (having grasped), pratyeka-bhārāt (from each weight) apanīya (having subtracted) nityam (constantly) |
śanaiḥ śanaiḥ (gradually) nīcatalaṃ (the global minimum) prayāti (it approaches), tamaḥ-apahatyai (for the destruction of error) gaṇitasya (of mathematics) mārgāt (through the path) |
etat (this) hi (indeed) satyaṃ (true) kṣayamārga-yānam (vehicle of the descending path/gradient descent), avikṣitaṃ (unwavering) bindugati-prakāśam (manifesting point-wise movement) |
viyuktam (subtracted) eka-eka-padena (step by step) nityaṃ (always), doṣa-apahāraṃ (the removal of error) kurute (makes) hi (certainly) yantram (the machine) ||
Vyākaraṇa:

śikṣā-gatiṃ: Accusative singular. Learning rate. The hyperparameter determining the stride length of the descent.
doṣa-natiṃ: The gradient vector containing the partial derivatives.
kṣayamārga-yānam: kṣaya (decrease/descent) + mārga (path) + yāna (vehicle/journey). Gradient Descent.18

Mathematical Equivalency and Commentary: The verse codifies the foundational Stochastic Gradient Descent (SGD) update rule: $w_{new} = w_{old} - \alpha \nabla w$, where $\alpha$ represents the learning rate (śikṣāgati). By applying a negative scalar fraction to the gradients and updating the weights and biases, the optimizer forces the parameters to travel down the multi-dimensional loss topography toward a theoretical global minimum. Programmatically, this is executed via layer.weights += -self.learning_rate * layer.dweights.3 If the learning rate is too large, the optimizer may overshoot the minimum and destabilize; if too small, it may stagnate or become trapped in a local minimum.3

Verse 13 - Mandākrāntā

वेगानुसारं स्मृतिम्-आददानः, संशोध्य मानं च तथा प्रभेदम्
आदम्-विधानेन तु यन्त्रबुद्धिः, क्षिप्रं समायाति हि लक्ष्यदेशम्
भिन्नां च शिक्षागतिम्-एक-भारे, प्रत्येति यन्त्रं गणितेन शुद्धम्
मार्गे स्थलित्वा न हि तिष्ठति स्म, वेगाधिकारात् परमार्थ-गामी ॥ १३ ॥

IAST transliteration

vegānusāraṃ smṛtim-ādadānaḥ, saṃśodhya mānaṃ ca tathā prabhedam ādam-vidhānena tu yantrabuddhiḥ, kṣipraṃ samāyāti hi lakṣyadeśam bhinnāṃ ca śikṣāgatim-eka-bhāre, pratyeti yantraṃ gaṇitena śuddham mārge sthalitvā na hi tiṣṭhati sma, vegādhikārāt paramārtha-gāmī || 13 ||

Padaccheda:
vega-anusāraṃ (according to momentum) smṛtim (memory) ādadānaḥ (taking), saṃśodhya (having corrected) mānaṃ (value) ca (and) tathā (also) prabhedam (variance) |
ādam-vidhānena (by the rule of Adam) tu (indeed) yantrabuddhiḥ (the machine intelligence), kṣipraṃ (swiftly) samāyāti (arrives at) hi (certainly) lakṣyadeśam (the target destination) |
bhinnāṃ (different) ca (and) śikṣāgatim (learning rate) eka-bhāre (on each individual weight), pratyeti (understands/applies) yantraṃ (the machine) gaṇitena (by the math) śuddham (pure) |
mārge (on the path) sthalitvā (having stumbled/fallen into a local minimum) na (not) hi (certainly) tiṣṭhati (stands/stops) sma (in the past), vega-adhikārāt (from the authority of momentum) paramārtha-gāmī (going to the ultimate truth/global minimum) ||
Vyākaraṇa:

vega-anusāraṃ: vega denotes velocity or momentum, acting as inertia to push past local minimums.
smṛtim: Memory represents the cache of past gradients stored by the optimizer.
ādadānaḥ: Root ā-√dā (to take) + śānac (present participle middle).
prabhedam: Accusative singular of prabheda. Variance, representing the squared gradients utilized in RMSProp and Adam algorithms.

Mathematical Equivalency and Commentary: The Adaptive Moment Estimation (Adam) optimizer integrates the inertia of Momentum with the adaptive, per-parameter learning rates of RMSProp.3 Adam maintains two moving average caches: $m_t = \beta_1 m_{t-1} + (1-\beta_1)g_t$ (the first moment, or mean of the gradients) and $v_t = \beta_2 v_{t-1} + (1-\beta_2)g_t^2$ (the second moment, or uncentered variance of the gradients).3 Bias corrections are subsequently applied to these caches to prevent early-stage zero-bias. By dividing the momentum by the square root of the variance cache, Adam dynamically adjusts the learning rate for each individual parameter (bhinnāṃ śikṣāgatim eka-bhāre), significantly accelerating convergence and preventing the model from stalling in localized topological depressions (mārge sthalitvā na hi tiṣṭhati).3

Prakaraṇa VI: Niyamīkaraṇam (Regularization and Dropout)

Deep learning models, possessing massive parameter capacities, are highly susceptible to overfitting - memorizing the noise of the training data rather than generalizing the underlying mathematical representation.3 Regularization penalizes complex weights, and Dropout layers introduce stochastic omission.

Verse 14 - Anuṣṭubh

भारस्य वर्ग-मानेन दण्डं दत्वा तु गण्यते
अति-स्मृतिविकारस्य शमनं सूत्रयोजिताम् ॥ १४ ॥

IAST transliteration

bhārasya varga-mānena daṇḍaṃ datvā tu gaṇyate ati-smṛtivikārasya śamanaṃ sūtrayojitām || 14 ||

Padaccheda:
bhārasya (of the weight) varga-mānena (by the squared value) daṇḍaṃ (penalty) datvā (having given) tu (indeed) gaṇyate (it is calculated) |
ati-smṛti-vikārasya (of the disease of over-memorization) śamanaṃ (pacification/cure) sūtrayojitām (applied in the formula) ||
Vyākaraṇa:

varga-mānena: varga (square) + māna (measure). Relates to the L2 penalty.
daṇḍaṃ: Penalty. Representing the lambda ($\lambda$) hyperparameter scaling the regularization loss.
ati-smṛti-vikārasya: ati (excess) + smṛti (memory) + vikāra (disease). A precise, conceptual translation for the phenomenon of “overfitting.”

Mathematical Equivalency and Commentary: L2 Regularization (Ridge) adds a penalty to the overarching loss function proportional to the squared magnitude of the weights: $L_{reg} = \lambda \sum w^2$. L1 Regularization (Lasso) utilizes the absolute value: $L_{reg} = \lambda \sum |w|$. By penalizing excessively large weights, the network is forced to distribute its predictive logic across a wider array of neurons, curing the disease of over-memorization (ati-smṛti-vikāra). Programmatically, the derivative of L2 applied during backpropagation is simply 2 * lambda * weights, pulling extreme parameter values back toward zero.3

Verse 15 - Anuṣṭubh

यदृच्छया तु तन्तूनां लोपं कृत्वा पुनः पुनः
विभाज्य लोप-मानेन शुद्धिं यन्त्रं समाचरेत् ॥ १५ ॥

IAST transliteration

yadṛcchayā tu tantūnāṃ lopaṃ kṛtvā punaḥ punaḥ vibhājya lopa-mānena śuddhiṃ yantraṃ samācaret || 15 ||

Padaccheda:
yadṛcchayā (randomly) tu (indeed) tantūnāṃ (of the connections/neurons) lopaṃ (dropout/omission) kṛtvā (having done) punaḥ punaḥ (again and again) |
vibhājya (having divided) lopa-mānena (by the dropout rate) śuddhiṃ (normalization) yantraṃ (the machine) samācaret (should perform) ||
Vyākaraṇa:

yadṛcchayā: Instrumental of yadṛcchā (random chance/stochasticity).
lopaṃ: Root √lup (to disappear/drop) + ghañ. Directly translates “Dropout.”
vibhājya: Prefix vi + Root √bhaj (to divide) + lyap.

Mathematical Equivalency and Commentary: The Dropout function forces the network to learn robust feature representations by randomly disabling a fraction of neurons during each forward training pass, governed by a Bernoulli distribution probability ($q$). The equation is: Output = $\frac{inputs \times \text{Bernoulli}(1-q)}{1-q}$. Because a percentage of the network is silenced, the aggregate mathematical sum of the layer decreases. To ensure the network evaluates identically during testing (when all neurons are active), the outputs of the remaining active neurons are scaled up during training by dividing by $(1-q)$. This programmatic execution self.output = inputs * self.binary_mask / self.rate prevents the phenomenon of neural co-adaptation, where specific neurons become overly reliant on the outputs of sibling neurons.3

Prakaraṇa VII: Parīkṣaṇa (Out-of-Sample Evaluation)

The ultimate proof of a neural network’s intelligence is its performance on data it has never observed during the training and backpropagation cycles.3

Verse 16 - Śālinī

अज्ञातं यज्-ज्ञायते यन्त्र-बोधैर्, नूतं दत्तं पार्थ-शोधे समीक्ष्यम्
शिक्षा-दोषो दृश्यते यत्र तुल्यः, सत्यं ज्ञानं तच्-च सामान्य-रूपम् ॥ १६ ॥

IAST transliteration

ajñātaṃ yaj-jñāyate yantra-bodhair, nūtaṃ dattaṃ pārtha-śodhe samīkṣyam śikṣā-doṣo dṛśyate yatra tulyaḥ, satyaṃ jñānaṃ tac-ca sāmānya-rūpam || 16 ||

Padaccheda:
ajñātaṃ (the unknown) yat (which) jñāyate (is known) yantra-bodhaiḥ (by the machine’s cognitions), nūtaṃ (new) dattaṃ (data) pārtha-śodhe (in separate testing) samīkṣyam (should be observed) |
śikṣā-doṣo (the training error) dṛśyate (is seen) yatra (where) tulyaḥ (equal), satyaṃ (true) jñānaṃ (knowledge) tat (that) ca (and) sāmānya-rūpam (generalized form) ||
Vyākaraṇa:

ajñātaṃ: na (not) + jñāta (known). Out-of-sample data.
pārtha-śodhe: pārthakya (separation) + śodha (examination/testing). Testing dataset.
sāmānya-rūpam: sāmānya (general/universal) + rūpa (form). A direct correlation to “generalization.”

Mathematical Equivalency and Commentary: If a network memorizes the training data, its training loss will artificially approach zero, yet its performance on unseen data will catastrophically diverge.3 The dataset must be split into training blocks and separate validation/testing blocks. Data shuffling is mathematically mandatory; if the network trains on sequentially ordered target classes, it will endlessly chase local minimums corresponding to the immediate class batch and fail to discover a generalized global minimum.3 A network is only deemed to possess true knowledge (satyaṃ jñānaṃ) when its performance on out-of-sample validation arrays mirrors its training metrics.3

Conclusions on the Synthesis of Epistemology and Computation

The structural mapping of deep learning architecture into classical Sanskrit verse transcends literary exercise; it represents a profound epistemological alignment. The source literature deconstructs the opaque “black box” of artificial intelligence into atomic, matrix-based mathematical operations, proving that intelligence can be synthesized through the application of rigorous calculus and linear algebra.3
Pāṇinian grammar operates on an identical philosophical premise: the entirety of human linguistic expression is generated from atomic roots (dhātu) via strict, algorithmic suffixes and conditional modifiers (pratyaya, āgama). The translation of natural language processing paradigms into classical frameworks is not merely theoretical; contemporary initiatives by entities like the Commission for Scientific and Technical Terminology (CSTT) and institutions such as IIT Bombay are actively leveraging this structural alignment.4 The creation of indigenous AI stacks - utilizing bilingual large language models and Sanskrit translation corpora - demonstrates that classical Indian linguistic logic provides a formidable computational foundation for machine learning systems.26
By mapping the matrix dot product to Bindu-ghāta, backpropagation to Pratiprasāraṇa, and overfitting to Ati-smṛti-vikāra, the modern calculus of deep learning is seamlessly integrated into a classical knowledge system. The codified verses encapsulate the entire lifecycle of a neural network - from the initial forward propagation of discrete signal matrices across weighted arrays, to the recursive descent of error gradients through multi-dimensional topologies. This synthesis demonstrates that the fundamental logic of artificial intelligence transcends the localized programmatic syntax of modern computer science and exists universally within the realm of structured, mathematical thought.

Works cited

‘It’s good coding’: Computer science students drawn to classes on Sanskrit, a 3,500-year-old language

University of Toronto, accessed March 28, 2026, https://www.utoronto.ca/news/it-s-good-coding-computer-science-students-drawn-classes-sanskrit-3500-year-old-language

Restricted Code - Fifty Two, accessed March 28, 2026, https://fiftytwo.in/story/restricted-code/
Neural%20Networks%20from%20Scratch%20in%20Python[1].pdf
Official Website of Commission for Scientific and Technical Terminology,Ministry of Education,Government of India - CSTT, accessed March 28, 2026, https://www.cstt.education.gov.in/en

Sanskrit Language->Glossary Category

Official Website of Commission for Scientific and Technical Terminology,Ministry of Education,Government of India, accessed March 28, 2026, http://cstt.education.gov.in/sanskrit-language-glossary-category

Backpropagation Basics Medium, accessed March 28, 2026, https://medium.com/@himalayaashish/backpropagation-from-basics-to-mastery-9d75a4a8a745
The Sanskrit word for algorithm: Vṛtti वृत्ति - the red pill <.>, accessed March 28, 2026, http://the-redpill.blogspot.com/2018/04/the-sanskrit-word-for-algorithm-vrtti.html
Translation: A Key Component of a Hundred-Year Project - Frontiers, accessed March 28, 2026, https://www.frontiersin.org/journals/communication/articles/10.3389/fcomm.2021.731354/full
COMPUTATIONAL SANSKRIT & DIGITAL HUMANITIES - Gallium, accessed March 28, 2026, http://gallium.inria.fr/~huet/PUBLIC/WSC2018.pdf

Towards an improved man and machine connect using Sanskrit

by Nikhil Malhotra, accessed March 28, 2026, https://medium.com/@nickmalhotra/towards-an-improved-man-and-machine-connect-using-sanskrit-dd6878e20655

Sanskrit Word Segmentation Using Character-level Recurrent and Convolutional Neural Networks - ACL Anthology, accessed March 28, 2026, https://aclanthology.org/D18-1295/
Indian mathematics - Wikipedia, accessed March 28, 2026, https://en.wikipedia.org/wiki/Indian_mathematics
3Tf^rT ^RTtfa ^icgiqHI - cstt.education.gov.in, accessed March 28, 2026, https://www.cstt.education.gov.in/sites/default/files/pan-indian-terms-mathematics-1986.pdf
Bias term: Significance and symbolism, accessed March 28, 2026, https://www.wisdomlib.org/concept/bias-term
Hybrid Machine Translation System Using Deep Learning, accessed March 28, 2026, https://www.akademisains.gov.my/asmsj/?mdocs-file=4679
Hybrid LETCNN-P Transformer Architecture for Enhanced Translation of Low-Resource Languages - IAENG, accessed March 28, 2026, https://www.iaeng.org/IJCS/issues_v52/issue_3/IJCS_52_3_27.pdf
Speech and Language Processing - Stanford University, accessed March 28, 2026, https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
Emptiness and Form - LessWrong, accessed March 28, 2026, https://www.lesswrong.com/posts/87MeyLu8Fy4xTj8d5/emptiness-and-form
Handwritten Sanskrit Word Recognition: A Deep Learning Approach Using AlexNet » CTSP, accessed March 28, 2026, https://journals.stmjournals.com/ctsp/article=2025/view=215236/
Digital Pāḷi Dictionary, accessed March 28, 2026, https://dpdict.net/?q=ka%E1%B9%87%E1%B8%8Da
glossary-operations-research-english-hindi-ped-946.pdf - CSTT, accessed March 28, 2026, https://www.cstt.education.gov.in/sites/default/files/glossary-operations-research-english-hindi-ped-946.pdf
Anticodians · Reading code critically in the antipodes, accessed March 28, 2026, https://anticodians.org/
Unveiling the relevance of Vedic mathematics in the age of artificial intelligence, accessed March 28, 2026, https://ijels.com/upload_document/issue_files/83IJELS-10620256-Unveiling.pdf
Optimization in India: A Historical Perspective - Ijarse, accessed March 28, 2026, http://ijarse.com/images/fullpdf/1505992264_IETEPune831.pdf
Understanding Backpropagation in AI - IndiaAI, accessed March 28, 2026, https://indiaai.gov.in/article/understanding-backpropagation-in-ai
Sanskrit to Python: AI Learns from India Front Page - YouTube, accessed March 28, 2026, https://www.youtube.com/watch?v=nuSD6We2Q1c
Samasāmayik: A Parallel Dataset for Hindi-Sanskrit Machine Translation - arXiv, accessed March 28, 2026, https://arxiv.org/html/2603.24307v1