Newer
Older
"A dictionary holding the accuracies for different values of $k$ that we find\n",
"when running cross-validation. After running cross-validation,\n",
"`k_to_accuracies[k]` should be a list of length num_folds giving the different\n",
"accuracy values that we found when using that value of $k$."
]
},
{
"cell_type": "code",
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
"metadata": {
"nbpresent": {
"id": "05e1ac10-1a25-4740-a21b-8b067116fd69"
}
},
"outputs": [],
"source": [
"k_to_accuracies = {}"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "f97b560b-929b-4a1f-90ee-3cf17ecef7e6"
}
},
"source": [
"We perform $k$-fold cross validation to find the best value of $k$. For each \n",
"possible value of $k$, run the $k$-nearest-neighbor algorithm `num_folds` times, \n",
"where in each case you use all but one of the folds as training data and the \n",
"last fold as a validation set. Store the accuracies for all fold and all \n",
"values of $k$ in the `k_to_accuracies` dictionary. "
]
},
{
"cell_type": "code",
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
"metadata": {
"nbpresent": {
"id": "bc2a21b5-4387-4bfc-8851-abcf62acaafc"
}
},
"outputs": [],
"source": [
"for k in k_choices:\n",
" \n",
" k_to_accuracies[k] = []\n",
" classifier = KNearestNeighbor_L1()\n",
" for i in range(num_folds):\n",
" X_cv_training = np.concatenate([x for k, x in enumerate(X_train_folds) if k!=i], axis=0)\n",
" y_cv_training = np.concatenate([x for k, x in enumerate(y_train_folds) if k!=i], axis=0)\n",
" classifier.train(X_cv_training, y_cv_training)\n",
" dists = classifier.compute_distances_two_loops(X_train_folds[i])\n",
" y_test_pred = classifier.predict_labels(dists, k=k)\n",
" k_to_accuracies[k].append(np.mean(y_train_folds[i] == y_test_pred))\n",
" \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "c24db8cd-04a8-45a6-b15e-24194bb42248"
}
},
"source": [
"We print out the computed accuracies."
]
},
{
"cell_type": "code",
"metadata": {
"nbpresent": {
"id": "972c66f2-03ea-4de0-8ac6-a564c3365f50"
}
},
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"k = 1, accuracy = 0.291000\n",
"k = 1, accuracy = 0.313000\n",
"k = 1, accuracy = 0.294000\n",
"k = 1, accuracy = 0.275000\n",
"k = 1, accuracy = 0.308000\n",
"k = 3, accuracy = 0.269000\n",
"k = 3, accuracy = 0.299000\n",
"k = 3, accuracy = 0.290000\n",
"k = 3, accuracy = 0.278000\n",
"k = 3, accuracy = 0.296000\n",
"k = 5, accuracy = 0.275000\n",
"k = 5, accuracy = 0.311000\n",
"k = 5, accuracy = 0.301000\n",
"k = 5, accuracy = 0.314000\n",
"k = 5, accuracy = 0.309000\n",
"k = 7, accuracy = 0.280000\n",
"k = 7, accuracy = 0.329000\n",
"k = 7, accuracy = 0.313000\n",
"k = 7, accuracy = 0.320000\n",
"k = 7, accuracy = 0.313000\n",
"k = 9, accuracy = 0.291000\n",
"k = 9, accuracy = 0.314000\n",
"k = 9, accuracy = 0.310000\n",
"k = 9, accuracy = 0.322000\n",
"k = 9, accuracy = 0.315000\n",
"k = 10, accuracy = 0.289000\n",
"k = 10, accuracy = 0.312000\n",
"k = 10, accuracy = 0.320000\n",
"k = 10, accuracy = 0.323000\n",
"k = 10, accuracy = 0.313000\n",
"k = 12, accuracy = 0.295000\n",
"k = 12, accuracy = 0.320000\n",
"k = 12, accuracy = 0.324000\n",
"k = 12, accuracy = 0.332000\n",
"k = 12, accuracy = 0.318000\n",
"k = 15, accuracy = 0.287000\n",
"k = 15, accuracy = 0.324000\n",
"k = 15, accuracy = 0.317000\n",
"k = 15, accuracy = 0.319000\n",
"k = 15, accuracy = 0.321000\n",
"k = 18, accuracy = 0.289000\n",
"k = 18, accuracy = 0.321000\n",
"k = 18, accuracy = 0.307000\n",
"k = 18, accuracy = 0.319000\n",
"k = 18, accuracy = 0.306000\n",
"k = 20, accuracy = 0.287000\n",
"k = 20, accuracy = 0.327000\n",
"k = 20, accuracy = 0.309000\n",
"k = 20, accuracy = 0.307000\n",
"k = 20, accuracy = 0.306000\n",
"k = 50, accuracy = 0.285000\n",
"k = 50, accuracy = 0.301000\n",
"k = 50, accuracy = 0.294000\n",
"k = 50, accuracy = 0.290000\n",
"k = 50, accuracy = 0.293000\n",
"k = 100, accuracy = 0.283000\n",
"k = 100, accuracy = 0.285000\n",
"k = 100, accuracy = 0.279000\n",
"k = 100, accuracy = 0.285000\n",
"k = 100, accuracy = 0.277000\n"
]
}
],
"source": [
"for k in sorted(k_to_accuracies):\n",
" for accuracy in k_to_accuracies[k]:\n",
" print('k = %d, accuracy = %f' % (k, accuracy))\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "57f5291f-1e32-456f-b84f-76eba7b40d44"
}
},
"source": [
"We plot the raw observations."
]
},
{
"cell_type": "code",
"metadata": {
"nbpresent": {
"id": "a028040f-a7a6-4b61-904d-48090dcbbe8d"
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"for k in k_choices:\n",
" accuracies = k_to_accuracies[k]\n",
" plt.scatter([k] * len(accuracies), accuracies)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "6a867f1e-9207-4d0d-adf9-7884532ed06e"
}
},
"source": [
"We plot the trend line with error bars that correspond to standard deviation."
]
},
{
"cell_type": "code",
"metadata": {
"nbpresent": {
"id": "caf9f446-5155-42db-a69b-46f1d7f06322"
}
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
"source": [
"accuracies_mean = np.array([np.mean(v) for k,v in sorted(k_to_accuracies.items())])\n",
"accuracies_std = np.array([np.std(v) for k,v in sorted(k_to_accuracies.items())])\n",
"plt.errorbar(k_choices, accuracies_mean, yerr=accuracies_std)\n",
"plt.title('Cross-validation on k')\n",
"plt.xlabel('k')\n",
"plt.ylabel('Cross-validation accuracy')\n",
"plt.show()"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}