====== SoftData Module ====== //SoftData// should be used on data where you want to perform machine learning (ML) so to estimate the values of a set of outputs given a set of inputs. Each row in the table can be specified as input, or output, or can be ignored. Currently, the SoftData module supports numerical data. Together with the mean of the predicted output the //SoftData// also produces the standard deviation, which can also be used as a measure of the accuracy of the model being create. //**IMPORTANT**//: There are two considerations that need to be kept in mind when using the //SoftData// module. * Currently, ML training (preparing the ML model) is performed at the first execution of the [[api_reference:problem|Problem]]. If the data is not altered ML does not need to be repeated. Therefore, the first execution of a [[api_reference:problem|Problem]] including a //SoftData// element will be more time consuming than following executions. The extent of the time required for performing training is dependent on the data and can not be predicted, and testing with a smaller data set is recommended. * During execution of a [[api_reference:problem|Problem]] including objectives and constraints (see [[api_reference:problem|Problem]] for more details), if the input parameters to the //SoftData// model are not set as constant the //SoftData// module predictions might be called many times increasing the computation time of the [[api_reference:problem|Problem]]. ---- ===== Endpoint ===== The SoftData endpoint provides a RESTful interface to interact with //SoftData// objects in the database. It allows authenticated users to create, update, and delete their own //SoftData// objects. The base URL for the //SoftData// endpoint is //[[https://datamanagerapi.solver-ai.com/api/data/soft-datas/]]//. ---- ===== HTTP Methods ===== * **GET** //soft-datas//: Retrieves a list of all SoftData objects associated with the authenticated user. * **POST** //soft-datas//: Creates a new SoftData object associated with the authenticated user. * **GET** //soft-datas/{id}//: Retrieves the SoftData object with the specified //id//. * **PUT** //soft-datas/{id}//: Updates the SoftData object with the specified //id//. * **PATCH** //soft-datas/{id}//: Updates part of a SoftData object with the specified //id//. * **DELETE** //soft-datas/{id}//: Deletes the SoftData object with the specified //id//. ---- ===== Data ===== A //SoftData// can be set up via the [[https://www.solver-ai.com/api|Browsable API]] (2 - Module Management) or programmatically (more on this [[:api_clients|here]]). The parameters required for creating a //SoftData// module via the [[https://datamanagerapi.solver-ai.com/api/data/soft-datas/|SoftData Browsable API]] are: * **Name**: Unique name identifying the //SoftData//. * **Csv**: Path to the csv file. * **VariablesStringIn**: Is the list of the input variables. * **VariablesStringOut**: Is the list of the output variables. * **VectorizationIndices (optional)**: Indices used for vectorizing the //SoftData//. Following is an example of the data when setup through the Browsable API: Name: Example Soft Data Csv: C:/test/data.csv VariablesStringIn: input1, input2 VariablesStringOut: output1, output2 VectorizationIndices: 3-5 The //data.csv// file contains the data on which ML has to be performed. The //VariablesStringIn// field specifies the headers of the columns to be used as inputs. The //VariablesStringOut// field specifies the headers of the columns to be used as outputs. ML is then performed on the data so that given values for //VariablesStringIn//, SOLVER-AI can predict the values of //VariablesStringOut//. **More on VectorizationIndices** The //VectorizationIndices// work similarly to the [[api_reference:equation|Equation]] and all the other modules. In the example, the VectorizationIndices field is set to '3-5', indicating that this soft data should be vectorized over the indices 3, 4, and 5. This would be equivalent to defining three separate //SoftData//: 'soft_data_3', 'soft_data_4', and 'soft_data_5'. For each of which, all of the header strings will be appended with _3, _4 and _5, respectively. As a result: * soft_data_3 will have: * VariablesStringIn: input1_3, input2_3 * VariablesStringOut: output1_3, output2_3 * soft_data_4 * VariablesStringIn: input1_4, input2_4 * VariablesStringOut: output1_4, output2_4 * soft_data_5 * VariablesStringIn: input1_5, input2_5 * VariablesStringOut: output1_5, output2_5 If //VectorizationIndices// is left empty, then the //SoftData// will not be vectorized and will remain as defined in //Csv//. As a result, all of the headers of the //Csv// files will be used as variables with the strings unchanged. ---- ===== Standard Deviation ===== When creating a //SoftData// module, for each of the variables specified in //VariablesStringOut//, an additional Standard Deviation parameter will be added. This means that if we had parameters like ''output1'' and ''output2'', we would obtain additional parameters such as ''output1_std'' and ''output2_std''. These additional parameters, output1_std and output2_std, represent the standard deviation of the predictions for output1 and output2 respectively. These parameters can give us a good idea of the level of accuracy we can expect in the prediction obtained by the machine language model. If the standard deviation for a variable is small, it means our prediction for that variable are consistently close to the mean prediction. This suggests that our machine learning model is quite reliable. On the other hand, a large standard deviation means our predictions can be quite different from the average. This suggests that there’s a higher degree of uncertainty in our predictions. Although, these parameters are added automatically, they can be used as any other output. These can for example be used for setting as constraints, to make sure no solutions for potentially unreliable predictions are included in the results. **VectorizationIndices** Note that, if VectorizationIndices (e.g., 1-3) were specified, for a variable ''output1'' we would have: * Variables: * ''output_1'' * ''output_2'' * ''output_3'' * Standard Deviation Variables'''' * ''output_std_1'' * ''output_std_2'' * ''output_std_3'' ---- ===== CSV File ===== The CSV file contains a table of values that represent //SoftData// for a problem. Each column represents a different variable in the problem. The first row of your CSV file must contain headers that represent the names of the columns. These names will be used as variable names in the SoftData module. It’s crucial that this header row accurately reflects the data contained in each column. Note that in the current version the columns can only contain real values. For illustration purposes, let's consider the following example. Suppose you're tasked with designing a car and have data relative to a variety of different cars for which you know the range per unit of energy (//range_unit_energy//: km per kWh) and for which you could estimate the range via an //Equation// as range = motor_power × battery_capacity × range_unit_energy A potential CSV file for this scenario would be the following: ^ motor_power_kW ^ battery_capacity_kWh ^ total_weight_kg ^ range_unit_energy ^ | 100 | 50 | 1500 | 3.4 | | 150 | 75 | 1400 | 3.6 | | 200 | 100 | 1300 | 3.8 | | ... | ... | ... | ... | with: * //VariablesStringIn//: motor_power, battery_capacity, body_weight * //VariablesStringOut//: range_unit_energy ---- ===== Permissions ===== Only authenticated users can interact with this endpoint, this can be done via the [[https://www.solver-ai.com/api|API]] page or programmatically via a token, which can be obtained from the [[https://www.solver-ai.com/accountmanagement|Account]] page. All //SoftData// created will be associated with the authenticated user, and and will not be accessible by other users. ---- ===== Notes ===== * If you attempt to delete a //SoftData// that is used in a problem, the request will be denied with a 403 Forbidden status code. * For information on setting the same programmatically follow the documentation relative to the [[:api_clients|API Clients]].