%% ME 2016 %% Fall 2007 %% %% AUTHOR: Chris Paredis %% %% PURPOSE: create a model for a car engine with torque converter through %% curve fitting %% %% This script defines the engine data in two vectors, rpm and torque, and %% creates several models based on curve fitting and then presents the %% results in a plot %% %% ASSUMPTIONS: %% - the given data points are exact measurements %% %% define the problem %% % to be on the safe side, clear all the variables clear all; format long g; warning off; % get the engine data for the engine with torque converter rpm = (500:500:6500); torque = [455 375 313 271 240 237 233 226 216 200 173 125 51]; poly_order = [1 3 6 8 12]; % initialize the vector of rpm values used to plot the curves % 1000 points will definitely be enough to create a smooth curve rpm_vector = linspace(-2000,8000,1000)'; torque_plot = zeros(length(rpm_vector),length(poly_order)); %% %% solve the problem %% % loop over all the polynomial orders for i=1:length(poly_order) % compute the polynomial coefficients [poly, S, mu] = polyfit(rpm, torque, poly_order(i)); % evaluate the polynomial densely for a smooth plot torque_plot(:,i) = polyval(poly, rpm_vector, [], mu); % add a label legend_labels{i} = sprintf('order = %d',poly_order(i)); disp([poly_order(i) sum((torque-polyval(poly,rpm,[],mu)).^2)]) end %% %% interpret the results %% figure(1) plot(rpm_vector, torque_plot, rpm, torque, '*'); grid on; xlabel('Engine Speed [RPM]'); ylabel('Engine Torque [Nm]'); title({'Different Engine Models using Polynomial Curve Fitting';'(ME 2016 - Chris Paredis)'}); legend_labels{length(poly_order)+1} = 'raw data'; legend(legend_labels); axis([-2000 8000 -50 600]); %% the code below computes the Leave-One-Out Cross Validation Metric %% (this is not part of the assignment, but is given just for your %% information) % initialize the mean squared error mse = zeros(1,length(poly_order)); for i=1:length(poly_order) y_error = zeros(size(rpm)); for j=1:length(rpm); % remove the j-th data point reduced_rpm = rpm; reduced_rpm(j) = []; reduced_torque = torque; reduced_torque(j) = []; % fit the data [poly,S,mu] = polyfit(reduced_rpm,reduced_torque,poly_order(i)); % store the error at data point j y_error(j) = polyval(poly, rpm(j),[],mu) - torque(j); end mse(i) = sum(y_error.*y_error); end disp(' order mse'); disp([poly_order', mse']) % revert back to the default formatting format short %% Answers to interpretation questions: %% Question 1: Do you notice any characteristics in the models that you %% know cannot possibly match the behavior of a real car engine? % yes, there are several artifacts that look very suspicious. % For order=1 -- based on the experimental data, it is clear that % this is not a good fit. Although one can expect some error in the % measurements, the size of the error for this fit is much larger than can % be explained solely based on measurement error. % For order=12 -- the oscillations in the model at RPMs below 1000 and above % 6000 are clearly inappropriate. These oscillations are due to % "over-fitting" -- the order is too high. % For order=8 -- the problem of over-fitting is not as bad as for 12th % order, but one can still see that for RPM > 7000, the torque shoots up % very quickly. This is a problem because it is still within the range of % RPMs that some engines may operate in, creating a real risk for erroneous % results. % For order=6 -- a similar effect as for 8th order occurs but this time % outside the range of possible RPM values: RPM < 0. Since there is no % risk that anybody would use the model in this range, this artifact is not % really a problem. For any of the RPM values of interest, this model % provides a very good approximation. % For order=3 -- very good also. no unrealistic artifacts. I turns out % that according to the LOOCV metric (see below) this is actually the best % choice. %% Question 2: Which metrics would you use to determine the goodness of %% fit? % One first choice would be Sr, the sum of the squared errors. Clearly, if % Sr is large, then the model is no good -- it doesn't predict the data % points well. For the first order polynomial, it would be better to use % correlation rather than Sr because it is a dimensionless quantity that is % easier to interpret than Sr. % However, when Sr becomes small (e.g. for higher orders), it becomes a % poor indicator for goodness of fit. For instance, for 12th order, Sr % equals zero! --> a perfect fit? As was discussed above, clearly not. % To guard against over-fitting, the best metric would be the sum of squared % errors obtained through the Leave-One-Out Cross-Validation approach as % disucssed in class. This LOOCV, will penalize both 8th order and 12th % order because they over-fit the data (Note: strictly speaking the LOOCV % metric for the 12th order polynomial cannot be evaluated because there % are too few data points.) The values for the LOOCV mean squared error % are: % order LOOCV mse % 1 2.321964734004017e+004 % 3 3.082906812762590e+002 --> smallest! % 6 6.496838669143775e+002 % 8 3.480900612476277e+003 % 1 2.693099734549461e+007 % % if you don't want to go through the effort of compiling the LOOCV metric, % you can use a qualitative assessment as was done in the answer for % question 1. %% Question 3: Which model would you recommend to be used in a car %% simulation? Justify your answer. % based on the considerations listed above, I would choose either the 3rd % or the 6th order polynomial -- they provide close fits without % over-fitting. Strictly speaking, the 3rd order wins by a hair based on % the LOOCV metric. The difference is small and both 3rd and 6th would be % good choices here.