Model
WekaModel Configuration
ALGORITHM 1. JRip
No. |
Options |
Description |
---|---|---|
1 |
-F <number of folds> |
|
2 |
-N <min. weights> |
|
3 |
-O <number of runs> |
|
4 |
-D |
|
5 |
-S <seed> |
|
6 |
-E |
|
7 |
-P |
|
ALGORITHM 2. Decision Table
No. |
Options |
Description |
---|---|---|
1 |
-S <search method specification> |
|
1.1 |
-P <start set> |
Specify a starting set of attributes. E.g., 1,3,5-7 |
1.2 |
-D <0 = backward | 1 = forward | 2 = bi-directional> |
Direction of search (default = 1) |
1.3 |
-N <num> |
Number of non-improving nodes to consider before terminating search |
1.4 |
-S <num> |
|
2 |
-X <number of folds> |
|
3 |
-E <acc | rmse | mae | auc> |
Performance evaluation measure to use for selecting attributes. (Default = accuracy for discrete class and rmse for numeric class) |
4 |
-I |
Use nearest neighbor instead of global table majority |
5 |
-R |
Display decision table rules |
ALGORITHM 3. PART
No. |
Options |
Description |
---|---|---|
1 |
-C <pruning confidence> |
|
2 |
-M <minimum number of objects> |
|
3 |
-R |
Use reduced error pruning |
4 |
-N <number of folds> |
|
5 |
-B |
Use binary splits only |
6 |
-U |
Generate unpruned decision list |
7 |
-J |
Do not use MDL correction for info gain on numeric attributes |
8 |
-Q <seed> |
Seed for random data shuffling (default 1) |
9 |
-doNotMakeSplitPointActualValue |
Do not make split point actual value |
ALGORITHM 4. J48
No. |
Options |
Description |
---|---|---|
1 |
-U |
Use unpruned tree |
2 |
-O |
Do not collapse tree |
3 |
-C <pruning confidence> |
|
4 |
-M <minimum number of instances> |
|
5 |
-R |
Use reduced error pruning |
6 |
-N <number of folds> |
|
7 |
-B |
Use binary splits only |
8 |
-S |
Don’t perform subtree raising. |
9 |
-L |
Do not clean up after the tree has been built |
10 |
-A |
Laplace smoothing for predicted probabilities |
11 |
-J |
Do not use MDL correction for info gain on numeric attributes |
12 |
-Q <seed> |
|
13 |
-doNotMakeSplitPointActualValue |
Do not make split point actual value |
ALGORITHM 5. RandomForest
No. |
Options |
Description |
---|---|---|
1 |
-P |
|
2 |
-O |
Calculate the out of bag error |
3 |
-store-out-of-bag-predictions |
Whether to store out of bag predictions in internal evaluation object |
4 |
-output-out-of-bag-complexity-statistics |
Whether to output complexity-based statistics when out-of-bag evaluation is performed |
5 |
Print the individual classifiers in the output |
|
6 |
-attribute-importance |
Compute and output attribute importance (mean impurity decrease method) |
7 |
-I <num> |
|
8 |
-num-slots <num> |
|
9 |
-K <number of attributes> |
|
10 |
-M <minimum number of instances> |
|
11 |
-V <minimum variance for split> |
|
12 |
-S <num> |
|
13 |
-depth <num> |
|
14 |
-N <num> |
|
15 |
-U |
Allow unclassified instances |
16 |
-B |
Break ties randomly when several attributes look equally good |
17 |
-output-debug-info |
If set, classifier is run in debug mode and may output additional info to the console |
18 |
-do-not-check-capabilities |
|
19 |
-num-decimal-places |
|
20 |
-batch-size |
|
ALGORITHM 6. RandomTree
No. |
Options |
Description |
---|---|---|
1 |
-K <number of attributes> |
|
2 |
-M <minimum number of instances> |
|
3 |
-V <minimum variance for split> |
|
4 |
-S <num> |
|
5 |
-depth <num> |
|
6 |
-N <num> |
|
7 |
-U |
Allow unclassified instances |
8 |
-B |
Break ties randomly when several attributes look equally good |
9 |
-output-debug-info |
If set, classifier is run in debug mode and may output additional info to the console |
10 |
-do-not-check-capabilities |
|
11 |
-num-decimal-places |
|
Custom Model Configuration
ALGORITHM 7. Custom Model
The following table details the essential fields and their descriptions for configuring a custom model. This configuration applies to Python-based custom models that need to communicate with a Java environment using JSON format for data exchange.
No. |
Fields |
Description |
---|---|---|
1 |
Module Name |
The file name of the main module, with its extension, in the custom model that you upload.
This module must include the specified Method Name, which acts as the entry point for data evaluation.
For example, if your custom model has a file named |
2 |
Method Name |
The name of the method in the main module that will process or evaluate the data. This method should be implemented to receive input data in JSON format, perform the necessary computations or evaluations, and return the results in JSON format. Ensure the method name matches exactly as defined in the code (case-sensitive). For example, if the method that handles the data is called |
3 |
Custom Files |
|
JSON Input/Output Standard
To ensure smooth communication between Python and Java, both environments must agree on the format of the input and output data. Below are the specifications for the JSON format:
Input JSON Structure: The input to the Python model should be a well-structured JSON object containing the data to be processed. The structure will vary depending on the specific requirements of the model but must follow a standard pattern so that the Java environment can easily generate and pass it to the Python model. For example:
{ "param1": "value1", "param2": "value2" }
Output JSON Structure: After the Python model processes the data, it will return the results in a JSON object format. The output will need to follow a consistent structure so the Java application can handle the results properly. The output fields must exactly match the field names used in the evaluation datasource in the database, ensuring that the results can be correctly mapped to the database.
CLASS: The output class field is required and must be included. The field name must match exactly (case-sensitive) the CLASS field used in the evaluation datasource.
probability and matchingRule: These fields are optional and should only be included if the model’s logic supports them. If included, their field names must also match exactly (case-sensitive) with the names used in the evaluation datasource.
For example:
{ "CLASS": "result_class_value", "probability": 0.85, "matchingRule": "some_rule" }
Example Python Code to Handle JSON Input/Output:
Here is an example of how the Python code can be structured to handle JSON input and output:
import json def evaluate_data(input_json): # Parse the incoming JSON data input_data = json.loads(input_json) # Perform model computations (Example) result = { "CLASS": "result_class_value", # Mandatory field "probability": 0.85, # Optional field, only if logic supports it "matchingRule": "some_rule" # Optional field, only if logic supports it } # Convert result to JSON and return it return json.dumps(result, indent=4)
This code ensures that the output JSON includes the CLASS field, as well as the optional probability and matchingRule fields, if they are part of the model’s logic. The output field names must match exactly (case-sensitive) the names used in the evaluation table in the database.
Model Testing Size and Number of Fold
Model Subtype |
Information |
Default Number |
---|---|---|
Custom Model |
Testing Size
Training Size
|
N/A
N/A
|
WekaModel |
Number of Fold for k-Cross Validation
|
5
|
#
# Convert testingSize to k-CrossValidation (Number of Folds)
#
[Conversion Formula]
numberOfFolds = Math.round(1 / testingSize);
Model Important Parameters
Parameter |
Type |
Probability |
---|---|---|
isBestModel |
Boolean |
Denotes the model with the highest accuracy, where the best model is characterized by an accuracy value closest to 1. |
isSelectedModel |
Boolean |
Refers to the model selected for calculating the probability or distance of an instance to the model. By default, the Best Model is also the Selected Model. |
isExplainedModel |
Boolean |
Represents the model used to identify the matching or nearest rule for a given instance. By default, the Explained Model is the PART algorithm. |
Model Train and Test Configurations
Parameter |
Type |
Probability |
---|---|---|
isTrained |
Boolean |
Indicates whether the model will be trained. By default, all Weka models are set to be trained (isTrained = true). |
isTested |
Boolean |
Specifies whether the model will undergo testing. Testing evaluates the model’s performance (e.g., accuracy, correct/incorrect predictions, and total). By default, all Weka models are set to be tested (isTested = true). (Note: A model can only be tested if it has been trained). |
Note
By default, any custom model uploaded is set with isTrained and isTested parameters. However, Ditto Hybrid Platform will not perform training or testing on these models. It is required that all custom models be explicitly trained and tested prior to being uploaded to the server.