SoftData Module

SoftData should be used on data where you want to perform machine learning (ML) so to estimate the values of a set of outputs given a set of inputs. Each row in the table can be specified as input, or output, or can be ignored. Currently, the SoftData module supports numerical data. Together with the mean of the predicted output the SoftData also produces the standard deviation, which can also be used as a measure of the accuracy of the model being create.

IMPORTANT:

There are two considerations that need to be kept in mind when using the SoftData module.

  • Currently, ML training (preparing the ML model) is performed at the first execution of the Problem. If the data is not altered ML does not need to be repeated. Therefore, the first execution of a Problem including a SoftData element will be more time consuming than following executions. The extent of the time required for performing training is dependent on the data and can not be predicted, and testing with a smaller data set is recommended.
  • During execution of a Problem including objectives and constraints (see Problem for more details), if the input parameters to the SoftData model are not set as constant the SoftData module predictions might be called many times increasing the computation time of the Problem.

Endpoint

The SoftData endpoint provides a RESTful interface to interact with SoftData objects in the database. It allows authenticated users to create, update, and delete their own SoftData objects.

The base URL for the SoftData endpoint is https://datamanagerapi.solver-ai.com/api/data/soft-datas/.


HTTP Methods

  • GET soft-datas: Retrieves a list of all SoftData objects associated with the authenticated user.
  • POST soft-datas: Creates a new SoftData object associated with the authenticated user.
  • GET soft-datas/{id}: Retrieves the SoftData object with the specified id.
  • PUT soft-datas/{id}: Updates the SoftData object with the specified id.
  • PATCH soft-datas/{id}: Updates part of a SoftData object with the specified id.
  • DELETE soft-datas/{id}: Deletes the SoftData object with the specified id.

Data

A SoftData can be set up via the Browsable API (2 - Module Management) or programmatically (more on this here).

The parameters required for creating a SoftData module via the SoftData Browsable API are:

  • Name: Unique name identifying the SoftData.
  • Csv: Path to the csv file.
  • VariablesStringIn: Is the list of the input variables.
  • VariablesStringOut: Is the list of the output variables.
  • VectorizationIndices (optional): Indices used for vectorizing the SoftData.

Following is an example of the data when setup through the Browsable API:

Name: Example Soft Data
Csv: C:/test/data.csv
VariablesStringIn: input1, input2
VariablesStringOut: output1, output2
VectorizationIndices: 3-5

The data.csv file contains the data on which ML has to be performed. The VariablesStringIn field specifies the headers of the columns to be used as inputs. The VariablesStringOut field specifies the headers of the columns to be used as outputs. ML is then performed on the data so that given values for VariablesStringIn, SOLVER-AI can predict the values of VariablesStringOut.

More on VectorizationIndices

The VectorizationIndices work similarly to the Equation and all the other modules.

In the example, the VectorizationIndices field is set to '3-5', indicating that this soft data should be vectorized over the indices 3, 4, and 5. This would be equivalent to defining three separate SoftData: 'soft_data_3', 'soft_data_4', and 'soft_data_5'. For each of which, all of the header strings will be appended with _3, _4 and _5, respectively. As a result:

  • soft_data_3 will have:
    • VariablesStringIn: input1_3, input2_3
    • VariablesStringOut: output1_3, output2_3
  • soft_data_4
    • VariablesStringIn: input1_4, input2_4
    • VariablesStringOut: output1_4, output2_4
  • soft_data_5
    • VariablesStringIn: input1_5, input2_5
    • VariablesStringOut: output1_5, output2_5

If VectorizationIndices is left empty, then the SoftData will not be vectorized and will remain as defined in Csv. As a result, all of the headers of the Csv files will be used as variables with the strings unchanged.


Standard Deviation

When creating a SoftData module, for each of the variables specified in VariablesStringOut, an additional Standard Deviation parameter will be added. This means that if we had parameters like output1 and output2, we would obtain additional parameters such as output1_std and output2_std.

These additional parameters, output1_std and output2_std, represent the standard deviation of the predictions for output1 and output2 respectively. These parameters can give us a good idea of the level of accuracy we can expect in the prediction obtained by the machine language model.

If the standard deviation for a variable is small, it means our prediction for that variable are consistently close to the mean prediction. This suggests that our machine learning model is quite reliable. On the other hand, a large standard deviation means our predictions can be quite different from the average. This suggests that there’s a higher degree of uncertainty in our predictions.

Although, these parameters are added automatically, they can be used as any other output. These can for example be used for setting as constraints, to make sure no solutions for potentially unreliable predictions are included in the results.

VectorizationIndices

Note that, if VectorizationIndices (e.g., 1-3) were specified, for a variable output1 we would have:

  • Variables:
    • output_1
    • output_2
    • output_3
  • Standard Deviation Variables
    • output_std_1
    • output_std_2
    • output_std_3

CSV File

The CSV file contains a table of values that represent SoftData for a problem. Each column represents a different variable in the problem. The first row of your CSV file must contain headers that represent the names of the columns. These names will be used as variable names in the SoftData module. It’s crucial that this header row accurately reflects the data contained in each column.

Note that in the current version the columns can only contain real values.

For illustration purposes, let's consider the following example. Suppose you're tasked with designing a car and have data relative to a variety of different cars for which you know the range per unit of energy (range_unit_energy: km per kWh) and for which you could estimate the range via an Equation as

range = motor_power × battery_capacity × range_unit_energy 

A potential CSV file for this scenario would be the following:

motor_power_kW battery_capacity_kWh total_weight_kg range_unit_energy
100 50 1500 3.4
150 75 1400 3.6
200 100 1300 3.8

with:

  • VariablesStringIn: motor_power, battery_capacity, body_weight
  • VariablesStringOut: range_unit_energy

Permissions

Only authenticated users can interact with this endpoint, this can be done via the API page or programmatically via a token, which can be obtained from the Account page.

All SoftData created will be associated with the authenticated user, and and will not be accessible by other users.


Notes

  • If you attempt to delete a SoftData that is used in a problem, the request will be denied with a 403 Forbidden status code.
  • For information on setting the same programmatically follow the documentation relative to the API Clients.

Privacy Policy Cookie Policy 
Website Terms and Conditions Platform Terms and Conditions 
X



//


SOLVER-AI ® is a registered trademark in the UK.
Copyright © 2022-2024 SOLVER-AI Ltd. All Rights Reserved.