What is cross-validation?

Beytullah Soylev
2 min readApr 5, 2023

--

Cross-validation is a technique used in machine learning and statistical modeling to evaluate the performance of a predictive model. The idea behind cross-validation is to divide the data into two parts: a training set, which is used to train the model, and a validation set, which is used to test the model’s performance.

Cross-Validation

The basic process of cross-validation involves:

  1. Dividing the data into k subsets (or “folds”) of equal size.
  2. Selecting one fold as the validation set and the remaining k-1 folds as the training set.
  3. Training the model on the training set and evaluating its performance on the validation set.
  4. Repeating steps 2–3 k times, using a different fold as the validation set each time.
  5. Computing the average performance across the k validation sets.

The result of cross-validation is an estimate of the model’s performance on new, unseen data. By repeating the process with different random splits of the data into training and validation sets, cross-validation provides a more robust estimate of the model’s performance than simply using a single validation set.

Some common variations of cross-validation include k-fold cross-validation, leave-one-out cross-validation, and stratified cross-validation. The choice of cross-validation technique depends on the specific problem and the amount and quality of available data.

“If you want to live a happy life, tie it to a goal, not to people or things.” Albert Einstein

Have a nice reading :D

--

--