Machine learning models intend to predict and forecast likely future outcomes with the aid of historical and existing data. These models “learn” by analysing current and historical data and projecting what it learns on a model generated to forecast likely outcomes. The models being developed at Smart Blood Analytics Swiss draw knowledge from historical medical cases and predict the diagnosis of new cases. The generalization of the models to new cases ultimately allows us to use machine learning algorithms every day to make predictions about the patient’s condition.
When the model learns the noise and fits too closely to the historical medical cases, the model becomes “overfitted,” and it is unable to generalize well to new cases. If a model cannot generalize well to new patients, then it will not be able to perform the prediction tasks that it was intended for. Overfitting is a concept in data science that occurs when a statistical model fits exactly against its training data. Generally, a machine learning algorithm is said to overfit if it is more accurate in fitting known data (hindsight) but less accurate in predicting new data (foresight). One can intuitively understand overfitting by dividing information from all past experiences into two groups: information that is relevant for the future and irrelevant information ("noise"). If a model uses much irrelevant information for reasoning and performs well only on historical data, it is overfitted. As we can see in the picture above, the baby has “overfitted” as it uses irrelevant information for reasoning about what the word 'father' means and consequently makes wrong calls in the feature.