Table of Contents
SoftData Module
SoftData should be used on data where you want to perform machine learning (ML) so to estimate the values of a set of outputs given a set of inputs. Each row in the table can be specified as input, or output, or can be ignored. Currently, the SoftData module supports numerical data. Together with the mean of the predicted output the SoftData also produces the standard deviation, which can also be used as a measure of the accuracy of the model being create.
IMPORTANT:
There are two considerations that need to be kept in mind when using the SoftData module.
- Currently, ML training (preparing the ML model) is performed at the first execution of the Problem. If the data is not altered ML does not need to be repeated. Therefore, the first execution of a Problem including a SoftData element will be more time consuming than following executions. The extent of the time required for performing training is dependent on the data and can not be predicted, and testing with a smaller data set is recommended.
Endpoint
The SoftData endpoint provides a RESTful interface to interact with SoftData objects in the database. It allows authenticated users to create, update, and delete their own SoftData objects.
The base URL for the SoftData endpoint is https://datamanagerapi.solver-ai.com/api/data/soft-datas/.
HTTP Methods
- GET soft-datas: Retrieves a list of all SoftData objects associated with the authenticated user.
- POST soft-datas: Creates a new SoftData object associated with the authenticated user.
- GET soft-datas/{id}: Retrieves the SoftData object with the specified id.
- PUT soft-datas/{id}: Updates the SoftData object with the specified id.
- PATCH soft-datas/{id}: Updates part of a SoftData object with the specified id.
- DELETE soft-datas/{id}: Deletes the SoftData object with the specified id.
Data
A SoftData can be set up via the Browsable API (2 - Module Management) or programmatically (more on this here).
The parameters required for creating a SoftData module via the SoftData Browsable API are:
- Name: Unique name identifying the SoftData.
- Csv: Path to the csv file.
- VariablesStringIn: Is the list of the input variables.
- VariablesStringOut: Is the list of the output variables.
- VectorizationIndices (optional): Indices used for vectorizing the SoftData.
Following is an example of the data when setup through the Browsable API:
Name: Example Soft Data Csv: C:/test/data.csv VariablesStringIn: input1, input2 VariablesStringOut: output1, output2 VectorizationIndices: 3-5
The data.csv file contains the data on which ML has to be performed. The VariablesStringIn field specifies the headers of the columns to be used as inputs. The VariablesStringOut field specifies the headers of the columns to be used as outputs. ML is then performed on the data so that given values for VariablesStringIn, SOLVER-AI can predict the values of VariablesStringOut.
More on VectorizationIndices
The VectorizationIndices work similarly to the Equation and all the other modules.
In the example, the VectorizationIndices field is set to '3-5', indicating that this soft data should be vectorized over the indices 3, 4, and 5. This would be equivalent to defining three separate SoftData: 'soft_data_3', 'soft_data_4', and 'soft_data_5'. For each of which, all of the header strings will be appended with _3, _4 and _5, respectively. As a result:
- soft_data_3 will have:
- VariablesStringIn: input1_3, input2_3
- VariablesStringOut: output1_3, output2_3
- soft_data_4
- VariablesStringIn: input1_4, input2_4
- VariablesStringOut: output1_4, output2_4
- soft_data_5
- VariablesStringIn: input1_5, input2_5
- VariablesStringOut: output1_5, output2_5
If VectorizationIndices is left empty, then the SoftData will not be vectorized and will remain as defined in Csv. As a result, all of the headers of the Csv files will be used as variables with the strings unchanged.
Standard Deviation
When creating a SoftData module, for each of the variables specified in VariablesStringOut, an additional Standard Deviation parameter will be added. This means that if we had parameters like output1
and output2
, we would obtain additional parameters such as output1_std
and output2_std
.
These additional parameters, output1_std and output2_std, represent the standard deviation of the predictions for output1 and output2 respectively. These parameters can give us a good idea of the level of accuracy we can expect in the prediction obtained by the machine language model.
If the standard deviation for a variable is small, it means our prediction for that variable are consistently close to the mean prediction. This suggests that our machine learning model is quite reliable. On the other hand, a large standard deviation means our predictions can be quite different from the average. This suggests that there’s a higher degree of uncertainty in our predictions.
Although, these parameters are added automatically, they can be used as any other output. These can for example be used for setting as constraints, to make sure no solutions for potentially unreliable predictions are included in the results.
VectorizationIndices
Note that, if VectorizationIndices (e.g., 1-3) were specified, for a variable output1
we would have:
- Variables:
output_1
output_2
output_3
- Standard Deviation Variables
output_std_1
output_std_2
output_std_3
CSV File
The CSV file contains a table of values that represent SoftData for a problem. Each column represents a different variable in the problem. The first row of your CSV file must contain headers that represent the names of the columns. These names will be used as variable names in the SoftData module. It’s crucial that this header row accurately reflects the data contained in each column.
Note that in the current version the columns can only contain real values.
For illustration purposes, let's consider the following example. Suppose you're tasked with designing a car and have data relative to a variety of different cars for which you know the range per unit of energy (range_unit_energy: km per kWh) and for which you could estimate the range via an Equation as
range = motor_power × battery_capacity × range_unit_energy
A potential CSV file for this scenario would be the following:
motor_power_kW | battery_capacity_kWh | total_weight_kg | range_unit_energy |
---|---|---|---|
100 | 50 | 1500 | 3.4 |
150 | 75 | 1400 | 3.6 |
200 | 100 | 1300 | 3.8 |
… | … | … | … |
with:
- VariablesStringIn: motor_power, battery_capacity, body_weight
- VariablesStringOut: range_unit_energy
Permissions
Only authenticated users can interact with this endpoint, this can be done via the API page or programmatically via a token, which can be obtained from the Account page.
All SoftData created will be associated with the authenticated user, and and will not be accessible by other users.
Notes
- If you attempt to delete a SoftData that is used in a problem, the request will be denied with a 403 Forbidden status code.
- For information on setting the same programmatically follow the documentation relative to the API Clients.