Automatic Design Space Explorer automates the process of searching for target processor configurations with favourable cost/performance characteristics for a given set of applications by evaluating hundreds or even thousands of processor configurations.
Input: ADF (a starting point architecture), TPEF, HDB
Output: ExpResDB (Section 2.2.8)
Applications are given as directories that contain the application specific files to the Explorer. Below is a description of all the possible files inside the application directory.
file name | Description |
---|---|
program.bc | The program byte code (produced using ``tcecc --emit-llvm ''). |
description.txt | The application description. |
simulate.ttasim | TTASIM simulation script piped to the TTASIM to produce ttasim.out file. If no such file is given the simulation is started with "until 0" command. |
correct_simulation_output | Correct standard output of the simulation used in verifying. If you use this verification method, you need to add verification printouts to your code or the simulation script and produce this file from a successful run. |
max_runtime | The applications maximum runtime (as an ascii number in nanoseconds). |
setup.sh | Simulation setup script, if something needs to be done before simulation. |
verify.sh | Simulation verify script for additional verifying of the simulation, returns 0 if OK. If missing only correct_simulation_output is used in verifying. |
Below is an example of the file structure of HelloWorld application. As The maximum runtime file is missing application is expected not to have a maximum runtime requirement.
HelloWorld/program.bc HelloWorld/correct_simulation_output HelloWorld/description.txt
The exploration result database <output_dsdb> is required always. The database can be queried applications can be added into and removed from the database and the explored configurations in the database can be written as files for further examination.
Please refer to explroe -h for a full listing of possible options.
Depending on the exploration plugin, the exploring results machine configurations in to the exploration result database dsdb. The best results from the previous exploration run are given at the end of the exploration:
explore -e RemoveUnconnectedComponents -a data/FFTTest --hdb=data/initial.hdb data/test.dsdb
Best result configurations: 1Exploration plugins may also estimate the costs of configurations with the available applications. If there are estimation results for the configuratios those can be queried with option
--conf_summary
by giving the
ordering of the results.
The Explorer plugins explained in chapters below can be listed with a command:
explore -g
And their parameters with a command:
explore -p <plugin name>
These commands can help if, for some reason, this documentation is not up-to-date.
Example:
explore -v -e ConnectionSweeper -u cc_worsening_threshold=10 -s 1 database.dsdb
This reduces the connections in the IC network starting from the configuration number 1 until the cycle count drops over 10%. This algorithm might take quite a while to finish, thus the verbose switch is recommended to print the progress and possibly to pick up interesting configurations during the execution of the algorithm.
The explore tool has a pareto set finder which can output the interesting configurations from the DSDB after the IC exploration. The pareto set can be printed with:
explore --pareto_set C database.dsdb
This prints out the pareto efficient configurations using the number of connections in the architecture and the execution cycle count as the quality measures. For visualizing the pareto set you can use the pareto_vis script (installed with TCE) to plot the configurations:
explore --pareto_set C database.dsdb | pareto_vis
SimpleICOptimizer is an explorer plugin that optimizes the interconnection network of the given configuration by removing the connections that are not used in the parallel program.
This is so useful functionality especially when generating ASIPs for FPGAs that there's a shortcut script for invoking the plugin.
Usage:
minimize-ic unoptimized.adf program.tpef target-ic-optimized.adf
However, if you want more customized execution, you should read on.
Parameters that can be passed to the SimpleICOptimizer are:
Param Name | Default Value | Description |
---|---|---|
tpef | no default value | name of the scheduled program file |
add_only | false | Boolean value. If set true the connections of the given configuration won't be emptied, only new ones may be added |
evaluate | true | Boolean value. True evaluates the result config. |
If you pass a scheduled tpef to the plugin, it tries to optimize the configuration for running the given program. If multiple tpefs are given, the first one will be used and others discarded. Plugin tries to schedule sequential program(s) from the application path(s) defined in the dsdb and use them in optimization if tpef is not given.
Using the plugin requires user to define the configuration he wishes optimize. This is done by giving -s <configuration_ID> option to the explorer.
Let there be 2 configurations in database.dsdb and application directory path app/. You can optimize the first configuration with:
explore -e SimpleICOptimizer -s 1 database.dsdb
If the optimization was succesfull, explorer should output:
Best result configuration: 3Add_only option can be used for example if you have an application which isn't included in application paths defined in database.dsdb but you still want to run it with the same processor configuration. First export the optimized configuration (which is id 3 in this case):
explore -w 3 database.dsdb
Next schedule the program:
schedule -t 3.adf -o app_dir2/app2.scheduled.tpef app_dir2/app2.seq
And then run explorer:
explore -e SimpleICOptimizer -s 3 -u add_only=true -u tpef=app_dir2/app2.scheduled.tpef database.dsdb
The plugin now uses the optimized configuration created earlier and adds connections needed to run the other program. If the plugin finds a new configuration it will be added to the database, otherwise the existing configuration was already optimal. Because the plugin won't remove existing connections the new machine configuration is able to run both programs.
You can pass a parameter to the plugin:
Param Name | Default Value | Description |
---|---|---|
allow_remove | false | Allows the removal of unconnected ports and FUs |
When using the plugin you must define the configuration you wish the plugin to remove unconnected components. This is done by passing -s <configuration_ID> to explorer.
If you do not allow removal the plugin will connect unconnected ports to some
sockets. It can be done with:
explore -e RemoveUnconnectedComponents -s 3 database.dsdb
or
explore -e RemoveUnconnectedComponents -s 3 -u allow_remove=false
database.dsdb
if you wish to emphasise you do not want to remove components. This will
reconnect the unconnected ports from the configuration 3 in database.dsdb.
And if you want to remove the unconnected components:
explore -e RemoveUnconnectedComponents -s 3 -u allow_remove=true
database.dsdb
Parameters that can be passed to the GrowMachine are:
Param Name | Default Value | Description |
---|---|---|
superiority | 2 | Percentage value of how much faster schedules are wanted until cycle count optimization is stopped |
Using the plugin requires user to define the configuration he wishes optimize. This is done by giving -s <configuration_ID> option to the explorer.
Example of usage:
explore -e GrowMachine -s 1 database.dsdb
Parameters that can be passed to the ImmediateGenerator are:
Param Name | Default Value | Description |
---|---|---|
false | Print information about machines instruction templates. | |
remove_it_name | no default value | Remove instruction template with a given name |
add_it_name | no default value | Add empty instruction template with a given name. |
modify_it_name | no default value | Modify instruction template with a given name. |
width | 32 | Instruction template supported width. |
width_part | 8 | Minimum size of width per slot. |
split | false | Split immediate among slots. |
dst_imm_unit | no default value | Destination immediate unit. |
Example of adding a new 32 width immediate template named newTemplate that is splitted among busses:
explore -e ImmediateGenerator -s 1 -u add_it_name="newTemplate" -u width=32 -u split=true database.dsdb
Parameters that can be passed to the ImmediateGenerator are:
Param Name | Default Value | Description |
---|---|---|
ic_dec | DefaultICDecoder | Name of the ic decoder plugin. |
ic_hdb | asic_130nm_1.5V.hdb | name of the HDB where the implementations are |
selected. | ||
adf | no default value | An ADF for the implementations are selected if no |
database is used. |
Example of creating implementation for configuration ID 1, in the database:
explore -e ImplementationSelector -s 1 database.dsdb
Parameters that can be passed to the ImmediateGenerator are:
Param Name | Default Value | Description |
---|---|---|
min_bus | true | Minimize buses. |
min_fu | true | Minimize function units. |
min_rf | true | Minimize register files. |
frequency | no default value | Running frequency for the applications. |
Example of minimizing configuration ID 1, in the database, with a frequency of 50 MHz:
explore -e MinimizeMachine -s 1 -u frequency=50 database.dsdb
The connections are create between register files in neighboring clusters and the extra.
In case the build_idf is set to true, hardware databases to be used for search for component implementations can be specified with -b parameter.
Parameters that can be passed to the ADFCombiner are:
Param Name | Default Value | Description |
---|---|---|
node_count | 4 | Number of times the node is replicated. |
node | node.adf | The architecture of node that will be replicated. |
extra | extra.adf | The architecture which will be added just once. |
build_idf | false | If defined, ADFCombiner will try to create .idf definition file. |
vector_lsu | false | If defined, the VectorLSGenerator plugin will be called to create wide load store unit. |
address_spaces | data | The semicolon separated list of address space names to be used by generated wide load store units (one unit per address space). |
Example of creating architecture with eight clusters, eight wide load and store, without creating implementation definition file:
explore -e ADFCombiner -u node=myNode.adf -u extra=myExtra.adf -u node_count=8 -u vector_lsu=true -u address_spaces="local;global" test.dsdb
If successful, explorer will print out configuration number of architecture created in test.dsdb. The created architecture can be written to architecture file with:
explore -w 2 test.dsdb
Example of creating architecture with four clusters with implementation definition file:
explore -e ADFCombiner -u node=myNode.adf -u extra=myExtra.adf -u node_count=4 -u build_idf=true -b default.hdb -b stream.hdb test.dsdb
If successful, explorer will print out configuration number of architecture created in test.dsdb. The created architecture and implementation definition file can be written to architecture file with:
explore -w 2 test.dsdb
VLIWConnectIC takes an .ADF architecture as input, and arranges its FUs into a VLIW-like interconnection. This is typically as baseline for running the BusMergeMinimizer and RFPortMergeMinimizer plugins. The output machine has some empty buses whose instruction slots are used to encode long immediates.
Param Name | Default Value | Description |
---|---|---|
wipe_register_file | yes | Replace the original register file(s) with a native VLIW RF. |
limm_bus_count | 1 | Number of empty buses used to encode long immediates. |
simm_width | 6 | Short immediate width. |
Example of invoking the plugin:
explore -e VLIWConnectIC -a test.adf -s 1 test.dsdb
explore -w 2 test.dsdb
Pekka Jääskeläinen 2018-03-12