Getting started


System requirements

Minimum requirements:

  • OS: Windows 7 (64 bit) Service Pack 1
  • Processor: Intel i5 Quad core
  • 4GB RAM
  • Video card capable of running OpenGL 3.3
  • Storage: 10GB of free hard drive space available

Recommended:

  • OS: Windows 10 (64 bit)
  • Processor: Intel i7 Quad core
  • 16GB RAM
  • Video card capable of running OpenGL 4.0
  • Storage: 30GB of free SDD space available

Supported programming languages and platforms

Superluminal is capable of displaying performance data for the C++, C# and Rust programming languages, on the Windows, Xbox One®, Xbox Series X®, PlayStation®4 and PlayStation®5 platforms.

  • To view PlayStation 4 documentation, click here.
  • To view PlayStation 5 documentation, click here.
  • To view Xbox One or Xbox Series X documentation, click here.
Xbox and PlayStation support is only accessible for registered developers for the respective platforms. To gain access to the Superluminal Xbox and PlayStation plugins, please contact us and we will send you instructions how to confirm your developer status. After status is confirmed, the Superluminal auto-updater will automatically download the required plugins for you.


Licensing and Activation

When the application is launched for the first time, you will need to enter a license key.

To start the trial period, press the ‘Try’ button. The trial is fully functional and lasts for 14 days. A license can be purchased through the Pricing page. After purchasing, your license key will be sent to your email. You can use this key to activate the application. An internet connection is required to complete the activation process. If you encounter any issues during the licensing process, please contact us.

After activating the product it is possible to work without an internet connection, but periodically a connection with the licensing server will be established to validate the license. If connecting to this server continues to fail for a period of time, you will be asked to connect to the internet before continuing.

After an initial activation has been completed, it is possible to manage your license by selecting Tools/Licensing from the main menu.

Here you can see your current license key and you have the option to enter a new license key. This can, for instance, be used to upgrade an activated trial license into a purchased license, or to renew an existing license. In cases where you want to migrate your license to another machine, you can manually deactivate your license on your machine so that the key can be activated on another machine. After deactivation, the application will no longer be usable on the machine it was deactivated on, until a new key is entered.

Automatic Activation

It is possible to pre-install a license so that Superluminal is automatically activated, without the user having to enter a license. To do so, the Superluminal installer can be run with a commandline option specifying where the license is located. The license should be located on the target machine and an absolute path should be provided. Furthermore, the /S commandline option can be used to silently install Superluminal. An example commandline that can be used to silently install Superluminal and automatically install a license located on the target machine:

SuperluminalPerformance-1.0.3115.1157.exe /S /lic=<absolute_path_to_license.lic>

Sampling and Instrumentation

Sampling and instrumentation each have their advantages and disadvantages. Advantages of sampling:

  • You can hit the ground running without making code modifications
  • You can spot problems that you did not anticipate. With instrumentation, you need to insert tags in places that you suspect could be problematic
  • Sampling can give you kernel-level stacks or stacks from libraries that you do not have control over

However, instrumentation has advantanges over sampling:

  • Instrumentation is more precise. Despite the high sampling frequency, absolute precision is better achieved with instrumentation.
  • Instrumentation can provide context. What file were you loading? What frame are you in? What state is your code currently in?

Unlike traditional profilers, the Superluminal profiler doesn't force you to make a tradeoff between sampling and instrumentation. Out of the box, the Superluminal profiler is a sampling profiler that runs at 8 kHz.
You can begin simply by starting a profiling session and incrementally add instrumentation events by using the Performance API as you discover where you want to place these events. The sampling data is then combined with the instrumentation events you add, giving you the best of both worlds. See the Instrumentation Timings View and Threads View sections for more information.


Starting a new profiling session

Before making your first capture, please be aware that in order to get symbol information, the application you're profiling must be built with certain settings. See Compiler & Linker settings for more information.

To start a new profiling session, enter the New Session page. If the page is not already visible, you can click File/New Session in the menu bar to go the New Session page.

Here you can choose to launch a new process through 'Run', to attach to an already running process through 'Attach', or to go to the Session Explorer. The Session Explorer will show all of your stored profiling sessions, allowing you to quickly find previously recorded sessions, to annotate and manage them. For more detailed information about the Session Explorer, see Managing Sessions & runs.

If PlayStation and/or Xbox profiling is enabled for your license, a platform subselection menu will appear for both Run and Attach:

If you're a registered PlayStation and/or Xbox developer, and you wish to gain access to the Superluminal Xbox and PlayStation plugins, please contact us and we will send you instructions how to confirm your developer status. The plugins will then be installed automatically through the auto-updater, and this submenu will appear for you as well.

Once we've selected either Run or Attach from the menu, we can select how to profile our application. This interface depends on the platform that you have selected. For instance, console platforms allow you to choose a devkit to launch your application on. Please refer to the documentation for non-Windows platforms for more details. On Windows, we can just browse to an application that we want to run, along with an optional working directory and commandline arguments. To start measuring right from the beginning of the application, check ‘Start profiling immediately’. If you prefer to launch the application first, and then select a specific time to start profiling, uncheck this box. Click ‘Run’ to launch the target application.

Note: a popup will appear asking whether the ProfilerCollectionService is allowed to run. This is the Superluminal service that will collect profiling data. Click ‘Yes’ to continue. If you chose to profile right from the start, the following screen will appear:

As soon as the timer is running, performance data of the target application is being captured. Press 'Start analyzing' to end the capture and start loading it. Press 'Cancel profiling' to return to the New Session page.

In case you decided not to start profiling from the start, the recording screen will be slightly different and look like this:

Here you can see that the timer is not running yet, which means the capture has not started yet, although the target application is already running. Click 'Start recording' to start capturing performance statistics, or click 'Cancel recording' to return to the New Session page.

Once a capture is stopped, it will first be written to disk, after which it will be loaded.

When this is run for the first time, symbols need to be downloaded and converted for use by Superluminal. Depending on the internet speed and the amount of symbols required, this may take some time. After they are downloaded and converted, a local cache will make sure that this only needs to happen once. For more information on configuring symbol resolve settings, please see Symbol Resolving.


Quick overview of the UI

The User Interface is divided into four major UI components:

  1. Instrumentation timings
  2. Threads view
  3. Callgraph, Function list and thread interaction views
  4. Source & dissasembly

When doing performance analysis, people naturally go through a few stages of determining where bottlenecks lie. When a program is not running as expected, you'd want to find hotspots as soon as possible. As soon as the hotspot is found, you dig deeper to understand the context of the problem. And finally, you can drill down into the details by inspecting timings in source code or even on a per-assembly instruction level. This top-down flow is reflected in the UI as follows:

  1. The Instrumentation Timings view will be empty until instrumentation events are sent from within the application. Instrumentation events are optional and without them, it is still easily possible to get a great overview of bottlenecks through the Threads view. However, when some high-level events are sent to the application, like a per-frame event, it is very convenient to spot performance spikes, or understand what your average framerate is.
  2. The Threads view displays a full recording of your threads on a timeline. At the top of each thread, an overview of the thread's activity is displayed. When the thread is opened up, the full recording of thread activity is displayed. Traditional sampling profilers do not have this temporal view of data and are mostly centered around callgraphs. This view is incredibly powerful because it displays the full context around any hotspot.
  3. The CallGraph and Function list views are traditional callgraph and butterfly views, except for the fact that they can filter out on any time range quickly.
  4. The Source & disassembly view can display per-line and per-instruction level timings.

To understand how the UI 'flows' even better, it's good to understand how time selections work. It is best explained by example:

  1. We selected an instrumentation event we would like to inspect from the events view. By clicking on it, the timerange for that event is selected in the Threads view.
  2. The Threads view highlights the selection.
  3. The callgraph and function list views respond to the selection and displays the information for that time area only, allowing you to inspect just that particular piece of code.
  4. The source code view reflects only the timings for this time range, allowing you to inspect the area you are interested in.

Working with Sessions


Both the Instrumentation Timings view and the Threads view are views that can be panned and zoomed. Being able to navigate them well is important, so there are multiple ways to control these views.

Toolbar buttons

The toolbar buttons on the left side of the graph allow for quick access of the various navigation modes. Clicking them will switch navigation modes. To use the selected mode, click and drag the graph to pan or zoom. To return to regular select mode, click the select toolbar button.

Navigation scrollbar

The navigation scrollbar underneath the graphs will let you both pan en zoom the graph. The center section of the scrollbar pans the graph, while the outer buttons control the zoom level. Click and drag them to zoom and pan.

Input bindings

Panning and zooming can also be controlled by using the configured input bindings from the settings dialog. This is the quickest way of navigating through the views. To view or reconfigure the input bindings or to modify the sensitivity, select Tools/Settings in the menu, and then select the 'Controls' tab.

In this dialog you can select a number of schemes that are used in other profilers. Selecting a preset will set the bindings to the bindings that are used in these programs.

Alternatively, you can modify the bindings manually. Any binding can be configured to have no key at all, to have a single key or to have multiple simultaneous keys. To add keys to a binding, select the key box for the appropriate action and press the key to add. Press backspace to remove keys or escape to cancel and return back to the previous setting. If you want to quickly remove all key bindings, press the small 'x' button.

It is also possible to add multiple bindings for a single action by pressing the down arrow on the right side of the action.

If you click 'Add binding', you can create an additional binding for an action. If any of the bindings for a single action is active, the action will be active. In the following example, we've bound the Pan action to both the left and right mouse buttons, so that they work on either left or right mouse button:

Although an action does not require a key, we recommend always binding a key for commonly used functions like panning and zooming: when a key is pressed it disallows interaction with the UI that resizes the threads, making panning and zooming a bit easier.

Each action should be configured with a mouse binding. It depends on the action what kind of bindings are supported. The left/middle/right mouse button bindings implictly mean a binding where the mode is activated through dragging the mouse. A mouse wheel binding is activated by simply scrolling the mouse wheel. The 'Zoom' action supports both dragging and mouse wheel bindings. The Vertical scroll only supports the mouse wheel binding. All other actions are drag operations.


The Instrumentation Timings view

When instrumentation events are added to the target application, the Instrumentation Timings view can be used to plot all instances in a chart.

Inspecting instrumentation events

By default, the Instrumentation Timings view will not contain any data until instrumentation events are sent from the target application. This can be achieved using the Performance API. This is optional, as the default sampling engine already provides a great starting point for making the first steps into profiling. See also Sampling and Instrumentation. When events are sent to the profiler, the UI is populated with information:

In this example we sent a "Render" event each frame so that we can measure framerate and get an overview of the framerate. You can add anything that helps you in organizing and adding context to your profiling session. We can select the instrumentation event type that was sent, as well as a thread. The chart itself can be controlled as explained in Navigating the UI. Because the chart can be zoomed in and out, a single bar can represent multiple instrumentation events. In such cases, a single bar in the chart is colored to represent the average and maximum length of the events that it represents. The lighter blue represent the average time, the darker blue represents the maximum time. When hovering over a bar, this information is displayed in a tooltip:

In this example, the average Render time (~0.6ms) is much shorter than the peak Render time (~2ms). When zooming in on this graph, the combined bar will split into separate bars until you reached the zoom level where each bar is drawn separately. In this case, the bar is always light blue. The following image clearly displays the variation in framerate and why the average time is much lower than the peak time:

To investigate an intrumentation event further, click on the bar to select it in the Threads view. This will also automatically select it in the CallGraph and Function list views.

Note: if we had selected a bar that represents multiple instrumentation events, the entire timerange from the first event until the last event would have been selected in the threads view.

To investigate this instrumentation event further, the Threads, CallGraph or Function list views can be inspected more thoroughly.

Controlling the chart

For explanation how to zoom in and out of the graph, see Navigating the UI. In addition, to quickly zoom to extents, use the 'zoom to extents' toolbar button. To control the height of the chart, you can either:

  • Use the vertical scrollbar
  • Press the 'normalize' toolbar button to normalize the height of the chart
  • Press the 'average' toolbar button to set the height of the chart to the average of the entire chart

The Threads view

The Threads view contains the data for all threads, with one row per thread. Each row displays the thread ID and name for that particular thread, as well as an overview of the high-level thread activity. Each row can be expanded to inspect more detailed data about that thread's activity. For information on how to set the thread name, see the Thread names section.

Selecting threads

Threads can be selected by clicking on the thread name. When holding the CTRL key while clicking, threads can be either added to- or removed from the selection. When holding the SHIFT key, the range of threads between the last clicked thread and the currently clicked thread is added to the selection:

Another convenient way of quickly selecting threads is by selecting 'similar' threads. Right-click on a thread to open the context menu:

The context menu will have an option to select threads that are named similarly to your thread, and it will indicate how many of these threads are present in the capture between parenthesis. In this case, selecting this option will select nine threads that all start with the name 'Job Scheduler':

When treads are selected in the Threads view, the CallGraph and Function list views will respond to the new selection and filter on the newly selected threads. The thread selection will remain in sync with CallGraph and Function list views: when a different set of threads is selected in any of these views, the Threads view will also select these threads.

Once threads are selected, they can be reordered or hidden from the view. See Sorting and reordering, or Hiding and unhiding for more information.

Expanding and collapsing

To expand or collapse a thread, press the expand icon:

This will expand a single thread fully so that all of the activity on that thread is displayed:

Threads can also be expanded by dragging the horizontal separator between threads. This is a convenient way to expand threads only partly. The cursor will switch to a resize icon when hovering over the horizontal separator:

As a fast shortcut, double-clicking on the thread name will always toggle the collapse/expand state as well.

Thread activity and interaction

Each thread is initially in a collapsed state, giving you a high-level overview of thread activity and how threads interact with each other.

The green color in the overview means the thread is in an executing state. Any other colors are variations on wait states. When hovering over the various colors, a tooltip is displayed explaining what the thread was doing at that time.

Depending on the zoom level, arrows are visible that indicate how threads interact with each other: how a thread is unblocked by another thread. This is very convenient to see how threads interact with each other. When hovering over a wait state, the arrow for that wait state will become visible and will animate. In the following example, the Streaming thread was blocking and waiting to be unblocked. It eventually got unblocked by 'Job Scheduler 3'. From what we can deduce at this point, it appears that a streaming thread is waiting for some command to (possibly) read or write data. A job in the job scheduler eventually kicks it to perform that operation.

A thread can be unblocked by another thread, but it may not yet be scheduled in by the thread scheduler. The length of the horizontal part of the arrow indicates the duration between the thread being set into a ready state and the time it was actually scheduled in by the OS. When hovering over a wait state, or when hovering over an arrow, click the arrow or wait state to get more information about the blocking and unblocking callstacks. We can now determine more precisely what was going on in our example.

We can see that Job Scheduler 3 called RequestSave. By clicking on the function, we can see the source code for that function. The source code clearly shows that we unlock a condition, allowing the streaming thread to perform the write. If we want to navigate between the blocking and unblocking stack, we can click the toolbar buttons on top of the stacks to navigate to the various stacks in the Threads view quickly.

Examining thread execution

When expanding a thread, a full recording of that thread is displayed.

In this example, we see our Streaming thread executing code. The light-blue bars are regular sampled functions, the darker blue bars are instrumentation bars. To add instrumentation events, use the Performance API. Notice that the instrumentation is merged into this view as if it was part of a regular callstack. Also notice that these events have additional information on them. The numbers in red are file sizes that we sent to the profiler. This is convenient for us to understand the ratio between write time, size and compression size. Another example is when we hover over one of the events:

The tooltip displays the length of the bar, and in the case of instrumentation event, the context. In this particular case we can see the filename that we were writing to. This can all be accomplished by providing context to instrumentation events. When we click on a bar, we select it and the CallGraph and Function list views will respond by displaying the information that is related to this particular bar and time range. To understand more about time selections, see Selecting ranges on the timeline.

Selecting ranges on the timeline

A time selection can be created by dragging a selection. By default this can be done simply by left-clicking and dragging a range, but this can reconfigured in the input bindings. When dragging a selection, the area outside of the selection is dimmed.

The callgraph, function list and source views will respond to the selection that was made and display only the information for this selection.

If you click on a bar within the time range selection, only the part of the bar will be selected that fits within the selected range:

Clicking outside of the selected range, or pressing the escape button will cancel the selection.

Measuring time

It can be convenient to measure how long something takes. The measure function can be accessed in two ways. The first one is by clicking the measure toolbar button:

When in measure mode, click the left mouse button and drag the mouse. You will see the timing for that time range.

To exit the measure mode, click the Select button in the toolbar. A quicker way to measure the length is to use the input binding for measuring. This is set to SHIFT by default, but can be altered by selecting Tool/Setting in the menu, and then selecting Controls.

Sorting and reordering

It can be convenient to sort the threads in a particular way so that threads are grouped together in a logical fashion. Superluminal offers ways to globally sort and manually reorder threads. Threads can be globally sorted through the sort toolbar button:

There are four ways to sort threads globally:

  • Start time. Sorts all threads by their start time. If threads have identical start times, the thread with the most utilization is preferred
  • Utilization. Sort all threads by their utilization (i.e. how much time they spend executing code)
  • Thread name. Sort all threads by their name
  • Thread ID. Sort all threads by their ID

Alternatively, we can reorder threads manually by dragging them. First, select the threads to reorder. To understand how to multi-select threads, see Selecting threads. Then, simply drag the threads to their desired position:

It can also be convenient to quickly order threads together, perhaps because they have a lot of thread interaction, or because they logically belong together. Select the threads that you want to place side by side, and then order them together through the context menu:

In this example, the streaming thread is woken by a Job Scheduler thread. We want to order these threads together. After clicking 'order together', the view looks like this:

Hiding and unhiding

Often when profiling, not all of the threads are of importance for your profiling task. These threads can be hidden from the view. First, select the threads to hide. To understand how to multi-select threads, see Selecting threads. Then, simply click the hide icon:

The threads will now be removed from the view. A button on the toolbar will appear to unhide any threads that you have hidden. By clicking on it, the unhide UI will appear:

Here we can unhide threads by double-clicking on single threads or groups of threads. We can also multi-select what threads we want to unhide and then selecting 'unhide':


The CallGraph view

The CallGraph view displays statistics for each function in a hierarchial fashion. For each function, we can view:

  • Inclusive time, the time of the function itself and all its children, recursively
  • Exclusive time, the time spent only in the function itself
  • Thread state, how much time was spent executing, waiting (and in what wait states)

The view can be filtered in a variety of ways. By default, the view will display data for the active time range. If no time range is selected, the time range for the entire session is used. For information how to select and clear time ranges, see Selecting ranges on the timeline. To select what threads to filter on, click on the Filter combobox. This will open up a thread selection UI:

In the thread selection UI, threads are automatically grouped on thread name. Both single threads and thread groups can be selected by clicking on them, and then pressing 'Apply'. It is also possible to multi-select threads by holding CTRL and clicking on the threads and/or thread groups that you want to select. To quick-select a thread or group of threads, double-click the item, and it will apply the selection immediately. In the following example, we clicked on the 'Job Scheduler' group, which will filter to all of the Job Scheduler threads:

An alternative way of filtering data in the CallGraph view is by clicking on a bar in the threads view. The view is now filtered so that is displays only the selected function and all its children:

To clear this function filter and return to the thread selection UI, click on the cross icon in the filter.

There are a number of convenience functions to make navigating the callgraph easier. They are accessible through the toolbar and through a context menu that can be accessed by pressing the right mouse button on an item in the callgraph:

  • Set as root. This will set the currently selected node as the root for the callgraph. This is convenient for clamping deep callstacks. When a root is set, it can be cleared by selecting the root node and selecting 'clear custom root' from the context menu or by pressing the toolbar button.
  • Expand hot path.This will recursively expand all the child nodes that have the highest inclusive time.
  • View in function list.The selected function will be selected in the function list so that you can see what the accumulated costs for the function are.

Timings can be displays in several formats by using the 'Timing display' combobox in the toolbar. In the following example we have switched to 'Absolute %':

The time display options are:

  • Milliseconds.This is the default format. All timings are in absolute time units.
  • Relative %.Each node in the tree will display the percentage of time spent, relative to the parent node.
  • Absolute %.Each node in the tree will display the absolute percentage within the entire graph.

The pie chart on the right side of the CallGraph mirrors the statistics of the callgraph, but in a graphical way to make sure you can see the distribution of the timings at a glance. The pie chart can be navigated as well and remains in sync with the CallGraph on the left side. By hovering over the pie chart, the timings are displayed in a tooltip and the full name of the function is displayed in the header.

The pie chart can be navigated by hovering over a pie piece and clicking the left mouse button. To go back to the caller, right-click anywhere on the pie chart.

When navigating through the CallGraph either by selecting functions in the tree or in the pie chart, the Source and Disassembly view is updated to display timings based on the selection that was made.


The Function list view

The Function list view displays a flat, sorted list of functions. Its purpose is to show aggregated times for a function, regardless of the stack. The CallGraph view is an excellent tool for understanding the performance costs of a single path in the code, but in a CallGraph view, it is more difficult to find the combined cost of multiple invocations of the same function from different code paths. The function list view is therefore very convenient for finding the combined time spent in a function, either inclusive or exclusive.

The view can be sorted on inclusive time or exclusive time by clicking on the column headers:

  • The Inclusive time is the time of the function itself and all its children, recursively
  • The Exclusive time is the time spent only in the function itself

The view can be filtered in a variety of ways. By default, the view will display data for the active time range. If no time range is selected, the time range for the entire session is used. For information how to select and clear time ranges, see Selecting ranges on the timeline. To select what threads to filter on, click on the Filter combobox. This will open up a thread selection UI:

In the thread selection UI, threads are automatically grouped on thread name. Both single threads and thread groups can be selected by clicking on them, and then pressing 'Apply'. It is also possible to multi-select threads by holding CTRL and clicking on the threads and/or thread groups that you want to select. To quick-select a thread or group of threads, double-click the item, and it will apply the selection immediately. In the following example, we clicked on the 'Job Scheduler' group, which will filter to all of the Job Scheduler threads:

An alternative way of filtering data in the Function list view is by clicking on a bar in the threads view. The view is now filtered so that is displays only the selected function and all its children:

To clear this function filter and return to the thread selection UI, click on the cross icon in the filter.

The Function list also serves as a 'butterfly' view: a view where we can see where the function was called from, and what it calls. Let look at a typical use case:

One of the things that pop out in this example is memcmp. Low-level functions like memcmp are typically called from many different locations and therefore harder to spot in the callgraph. The combined time, however, can be significant as shown in this example. By clicking on an item in the function list, we will update the Source and Disassembly view. The source view will display all the time spent in all invocations of the function within the time range. Also, after an item is clicked, we can find out where the function was called from and what code paths were responsible for what portion of the total time spent. The trees on the right of the list shows the callers ('called by') and callees ('calls'):

  • The Called by tree shows what functions called the selected function, and how much time was spent in that code path.
  • The Calls tree shows all functions that are being called by the function, and how much time was spent in that code path.

Double-clicking nodes in the trees will center the function in the function list.

In our example we see that multiple functions like ReadCompoundProperty and GetCompound are responsible for the largest portion of memcmp calls. We can open the subnodes to further investigate the paths leading to these functions.

It is also possible to search within the function list. The text box above the list functions as a filter. In the following image you can see how the list is limited to functions that only contain the 'readCompound' substring:


The Source & Disassembly view

When selecting a function in the CallGraph, Function list or the Thread Interaction view, the Source and Disassembly view is updated to display the function along with timings.

Here you can see how much time was spent, and in what thread states, per source code line. When hovering over the thread state, more timing information is displayed.

If the source file could not be resolved but the image file (DLL, exe) is present on the disk, a disassembly view is displayed. For instance, when clicking on a Windows DLL function, the disassembly is displayed if the signatures of the DLL match and if the process has access to the file. Per-instruction timings will be available:

If the source file could be resolved and the image file is present on the disk, mixed-mode disassembly can be displayed by clicking the disassembly icon in the toolbar.

We can find text in the Source and Disassembly view either by pressing CTRL+F, or by clicking the find button in the toolbar. Like traditional Windows applications, F3 and SHIFT+F3 will go the next and previous find results.


The Find window

The Find window is a window specific for finding functions in the Threads view. There is also a local find window in the Source and Disassembly view that simply finds text. To find functions in the Threads view, press CTRL+F, or click the find button in the main toolbar. The Find window will be displayed in the top right of the Threads view.

When you start typing, a list of suggestions is made. Any sampled or instrumented function will be suggested that has a partial match with the typed text:

After pressing ENTER, or selecting a function from the list, the entire session is searched for all functions that match the name, thread and time range. For information about time ranges, see Selecting ranges on the timeline. The window will display how many hits it found, and the Threads view will highlight all hits in yellow.

To browse through the results, click the next and previous buttons, or use F3/SHIFT+F3 to cycle through them.


Managing sessions & runs

On the 'New session' page, we have the ability to manage your previously recorded sessions through the Session Explorer. We can also manage recently launched applications, so that we can quickly re-launch an application with the same set of parameters.

Session Explorer

When we click on the Session Explorer, we see all of the sessions that were previously recorded. By default this acts as a recent file list, because the list is sorted on 'Recently Used'. To open a session, double-click it.

Besides acting as a recent file list, the explorer lets you manage all of your sessions. You can rename a session by clicking on the pencil icon for a single session. You can also add annotations to you session, so that you can remember the details surrounding the particular capture. To do so, click the icon with the text balloon to add notes to your session.

If there are files that are not present in your capture folder, but you still want to open them, use the 'Browse' button to open a traditional browse-to-file dialog.

To clean up your captures folder, each session displays the size it uses on disk. You can also sort the entire list of sessions by their size. By holding CTRL or SHIFT while clicking, you can multi-select sessions. If you press DELETE, or click the 'delete' button in the toolbar, you can erase the selected sessions.

Recently launched

Under 'Recently Launched', we see a number of applications that we have profiled earlier. We may have ran an application with different sets of parameters, like different commandline parameters or a different capture frequency. Or perhaps we sometimes run a DEBUG or a RELEASE configuration. To make re-launching of the same application in different configurations easier, we categorize the various runs that you have performed. Click on a recently launched application to see the configurations.

Here we can see two configurations for our 'Profiler' application that we have profiled. We ran both a DEBUG and RELEASE configuration. The differences between the configurations that you have ran are shown for each configuration.

Currently, the configurations are still untitled. To organize your configurations, click on the pencil icon to rename your configurations to a sensible name.

When you click on any of the configurations, the parameters in the configuration will be set in the right panel and you can click 'Run' to launch the application with the set of parameters from the configuration directly.

Using the Performance API

The PerformanceAPI can be used to communicate with Superluminal Performance from the target application. It is used to markup code with instrumentation events and to give names to threads. The API can be used either by statically linking to a library, or by dynamically loading a DLL at runtime.

Static linking

The Performance API (lib and header) is located in the ‘API’ subfolder of the installation directory. The libraries that are provided are compatible with VS2015 (toolset v140) and up. The libraries are shipped in multiple configurations. The filename suffixes describe what configuration is used and reflect the Microsoft compiler flags:

  • PerformanceAPI_MD.lib. Release static library with dynamically linked runtime.
  • PerformanceAPI_MT.lib. Release static library with statically linked runtime.
  • PerformanceAPI_MDd.lib. Debug static library with dynamically linked runtime.
  • PerformanceAPI_MTd.lib. Debug static library with statically linked runtime.
CMake

A FindSuperluminalAPI.cmake file is provided that can be used to easily consume the PerformanceAPI from CMake based projects. This module can be used to find the Superluminal API libs & headers via find_package.

For example: find_package(SuperluminalAPI REQUIRED)

You can use it by adding the API directory of your Superluminal install to your CMAKE_PREFIX_PATH, or alternatively by copying the entire API directory to a place of your own choosing and adding that location to CMAKE_PREFIX_PATH.

The following (optional) variables can be set prior to issueing the find_package command:

  • SuperluminalAPI_ROOT The root directory where the libs & headers should be found. For example [SuperluminalInstallDir]\API.
    • If this is not set, the libs & headers are assumed to be next to the location of FindSuperluminalAPI.cmake
  • SuperluminalAPI_USE_STATIC_RUNTIME
    • If this is set, the libraries linked to the static C runtime (i.e. /MT and /MTd) will be returned
    • If not set, the libraries linked to the dynamic C runtime (i.e. /MD and /MDd) will be returned

On completion of find_package, the following CMake variables will be set:

  • SuperluminalAPI_FOUND Whether the package was found
  • SuperluminalAPI_LIBS_RELEASE The Release libraries to link against
  • SuperluminalAPI_LIBS_DEBUG The Debug libraries to link against
  • SuperluminalAPI_INCLUDE_DIRS The include directories to use

In addition, if find_package completed successfully, the target SuperluminalAPI will be defined. You should prefer to consume this target via target_link_libraries(YOUR_TARGET PRIVATE SuperluminalAPI), rather than by using the above variables directly.

Dynamic library

For cases where linking statically to the PerformanceAPI is not desireable, a dynamic library is also provided. The dynamic library provides the same API as the static library, but does not require linking to a library. The DLL and headers are located in the ‘API’ subfolder of the installation directory. The DLL is intended to be used in the following manner:

  • Copy PerformanceAPI_capi.h to your source tree
    • Optionally, also copy PerformanceAPI.dll to your project
  • Load PerformanceAPI.dll through LoadLibrary from Superluminal's installation directory or the location you copied it to.
  • Use GetProcAddress to find the PerformanceAPI_GetAPI function exported from the DLL and cast it to the PerformanceAPI_GetAPI_Func type exposed in the PerformanceAPI_capi.h header.
  • Call the resulting PerformanceAPI_GetAPI function pointer using PERFORMANCEAPI_VERSION for the first argument and a pointer to a PerformanceAPI_Functions struct for the second argument.
    • If successful (i.e. return code is 1), the PerformanceAPI_Functions struct will be filled with function pointers to the API
  • Use the PerformanceAPI_Functions struct to call API functions as needed.

In code form, a working example of the above is as follows. Note that a helper function, PerformanceAPI_LoadFrom, is also provided in header PerformanceAPI_loader.h that does all this work for you, so this just serves as an example.

// Note: this is just an example; a full working implementation is provided in PerformanceAPI_loader.h
int PerformanceAPI_Load(const wchar_t* inPathToDLL, PerformanceAPI_Functions* outFunctions)
{
    HMODULE module = LoadLibraryW(inPathToDLL);
    if (module == NULL)
        return 0;

    PerformanceAPI_GetAPI_Func getAPI = (PerformanceAPI_GetAPI_Func)((void*)GetProcAddress(module, "PerformanceAPI_GetAPI"));
    if (getAPI == NULL)
    {
        FreeLibrary(module);
        return 0;
    }

    if (getAPI(PERFORMANCEAPI_VERSION, outFunctions) == 0)
    {
        FreeLibrary(module);
        return 0;
    }

    return 1;
}

int main(int _argc, char** _argv)
{
    PerformanceAPI_Functions performanceAPI;

    // Load the API from the default installation path
    if (!PerformanceAPI_Load(L"C:\\Program Files\\Superluminal\\Performance\\API\\dll\\x64\\PerformanceAPI.dll", &functions))
        return -1;

    // Add an instrumentation event
    {
      performanceAPI.BeginEvent("Example Instrumentation", NULL, PERFORMANCEAPI_DEFAULT_COLOR);    
      // ... some long running code here ...    
      performanceAPI.EndEvent();
    }

    return 0;
}

Instrumentation

To send instrumentation data to Superluminal, two mechanisms are provided:

  • The InstrumentationScope class. This class will signal the start of a scope in its constructor and signal the end of the scope in its destructor. InstrumentationScopes can be freely nested.
  • The PerformanceAPI_* free functions.
    • These functions are designed for integration with existing profiling systems that, for example, already define their own scope-based profiling classes. Note: calls to the PerformanceAPI_BeginEvent/PerformanceAPI_EndEvent functions must be within the same function. For example, it is not allowed to call PerformanceAPI_BeginEvent in function Foo and PerformanceAPI_EndEvent in function Bar.
    • Additional overloads with postfix _N are provided that can be used with strings that are not null-terminated.
    • This is a C API to ease integration with other programming languages.

When sending an instrumentation event through either of these mechanisms, two pieces of data can be provided:

  • The event ID (required). This must be a static string (e.g. a regular C string literal). It is used to distinguish events in the UI and is displayed in all views (Instrumentation Chart, Timeline, CallGraph). It is important that the ID of a particular scope remains the same over the lifetime of the program: it is not allowed to use a string that changes for every invocation of the function/scope. Some examples of IDs: the name of a function ("Game::Update"), the operation being performed ("ReadFile").
  • The event Data (optional): This must be a string that is either dynamically allocated or a regular string literal. You are free to put whatever data you want in the string; there are no restrictions. The data is also free to change over the lifetime of the program. The intent of the data string is to include data in the event that can differ per instance. This data is displayed in the Instrumentation Chart and Timeline for each single instrumentation event. Some examples of Data strings: the current frame number (for "Game::Update"), the path of the file being read (for "ReadFile").

All const char* arguments in the API are assumed to be UTF8 encoded strings (i.e. non-ASCII chars are fully supported).

To completely disable the API, you can define PERFORMANCEAPI_ENABLED to 0 prior to including PerformanceAPI.h. This macro will cause all functions to be compiled out and the PERFORMANCEAPI_INSTRUMENT_* macros to be empty.

Thread names

Thread names can be visualized in the profiler. If you are already using the Windows SetThreadDescription API, the names will automatically appear in the profiler. Note however, that this function is only available starting with Windows 10, build 1607 (Anniversary Update). The Performance API also has a function to set thread names. It will use SetThreadDescription internally when available, but falls back to a custom method that will send the names to the profiler manually. To use it, call PerformanceAPI_SetCurrentThreadName for the currently active thread.

Example

The following is a fully-functional example of how the API can be used to instrument your code and set thread names. For brevity, it assumes usage of the static library, but equivalent functionality can be achieved with the DLL interface.

#include "PerformanceAPI/PerformanceAPI.h"
#include <thread>
#include <chrono>

void MacroInstrumentedFunction()
{
    PERFORMANCEAPI_INSTRUMENT_FUNCTION();
    std::this_thread::sleep_for(std::chrono::milliseconds(16));

    {
        PERFORMANCEAPI_INSTRUMENT_DATA("Nested scope with contextual data", "");
        std::this_thread::sleep_for(std::chrono::milliseconds(16));
    }

    {
        PERFORMANCEAPI_INSTRUMENT_COLOR("Nested scope with color", PERFORMANCEAPI_MAKE_COLOR(255, 0, 0));
        std::this_thread::sleep_for(std::chrono::milliseconds(16));
    }
}

int main(int _argc, char** _argv)
{
  PerformanceAPI_SetCurrentThreadName("Main");

  {
      PerformanceAPI_BeginEvent("Pass 1", nullptr, PERFORMANCEAPI_DEFAULT_COLOR);
      for (int i = 0; i < 5; ++i)
      {
          MacroInstrumentedFunction();
      }
      PerformanceAPI_EndEvent();
  }

  {
      PerformanceAPI_BeginEvent("Pass 2", nullptr, PERFORMANCEAPI_DEFAULT_COLOR);
      for (int i = 0; i < 5; ++i)
      {
          MacroInstrumentedFunction();
      }
      PerformanceAPI_EndEvent();
  }

  return 0;
}

When the above example program is profiled with Superluminal, it will result in a profile that looks something like the following:


Symbol resolving

In order to resolve symbols, the application you're profiling must be configured correctly. In addition, Superluminal Performance can be configured to retrieve symbols and source files from symbol and source servers, respectively.

Compiler & Linker settings

The following is a list of compiler & linker settings that need to be enabled in the configuration properties of the application (and related modules) you're profiling, in order to be able to correctly resolve symbols.

  • Compiler
    • C/C++ -> General -> Debug Information Format
      • C7 compatible (/Z7) or
      • Program Database (/Zi)
  • Linker
    • VS2015 and earlier:
      • Linker -> Debugging -> Generate Debug Info -> Yes (/DEBUG)
    • VS2017 and newer:
      • Linker -> Debugging -> Generate Debug Info -> Generate Debug Information optimized for sharing and publishing (/DEBUG:FULL)
In particular, note that /DEBUG:FASTLINK is not supported. Profiling applications build with /DEBUG:FASTLINK will result in no symbols being resolved.


Symbol locations

You can add or edit locations that should be searched when symbol (*.pdb) or image files (*.exe, *.dll) are needed during symbol resolving by going to the Tools/Settings menu and selecting the 'General' tab.

An arbitrary number of symbol locations can be added. Do keep in mind that a large number of symbol locations may slow down the symbol resolving process, as each location must be tried for any unmatched symbol or image file. When symbol & image files are retrieved from symbol locations, they are cached in the Symbol Cache directory, which can also be edited through the Settings menu. It is recommended to place the Symbol Cache directory on a drive with sufficient space, as it may grow quite large.

Finally, there are two ‘types’ of symbol locations that can be specified: symbol servers and local directories.

Symbol Servers

A Symbol Server is nothing more than a directory with a structure as produced by Microsoft’s SymStore tool. The Symbol Server can be accessed as a local directory, over HTTP(S), or through a network share. For example, the following are all valid Symbol Server locations:

  • https://msdl.microsoft.com/download/symbols
  • D:\Symbols
  • \\SomeNetworkShare\Symbols

Local directories & Network shares

A local directory or network share can be used to load symbols, even if it's not in the Symbol Store format. These directories should contain flat lists of PDBs and/or image files. Symbols or image files in this directory will only be used if they match the signature of the required PDB/image.

Source Server / Source Indexing

If your PDBs are correctly source indexed, Superluminal will retrieve source files through your source server when required by the Source view. Superluminal supports both regular source servers (i.e. retrieved through source control such as Perforce) and HTTP(S)-based source servers.

See the Windows documentation for more information about source indexing.


Auto Update

When the application is launched, an update check is performed to see if there are new versions available. This happens at most once a day. To manually search for updates, you can select 'Help/Check for updates' from the main menu. It is also possible to change the auto update settings by selecting Tool/Settings, and then selecting the 'Auto Update' tab:

You can disable automatic updates here. Early adoptors can also opt-in for the Insider releases. Those releases contain the latest features that have not been tested as extensively as the stable builds.

In case you want to switch back to a previous stable release of Superluminal, go to our downloads page. Any previous release can just be downloaded and installed over a newer release.

Third party software

Superluminal Performance uses various third party software. To view a list of all used third party software and their licenses, please select Help/About from the main menu.