Spec cpu2006 benchmark tools


















Some testers may wish to also specify the exact identifier of the version actually used in the test for example, "build ". Such additional identifiers may aid in later result reproduction, but are not required; the key point is to include the name that customers will be able to use to order the component.

The configuration disclosure includes fields for both "Hardware Availability" and "Software Availability". In both cases, the date which must be used is the date of the component which is the last of the respective type to become generally available.

If a software or hardware date changes, but still falls within 3 months of first publication, a result page may be updated on request to SPEC.

If a software or hardware date changes to more than 3 months after first publication, the result is considered Non-Compliant. SPEC is aware that performance results for pre-production systems may sometimes be subject to change, for example when a last-minute bugfix reduces the final performance. For results measured on pre-production systems, if the tester becomes aware of something that will reduce production system performance by more than 1. The following sections describe the various elements that make up the disclosure of the system configuration tested.

The SPEC tools allow setting this information in the configuration file, prior to starting the measurement i. It is also acceptable to update the information after a measurement has been completed, by editing the rawfile. Rawfiles include a marker that separates the user-editable portion from the rest of the file. There is information about rawfile updating in the rawformat section of the document utility. SPEC recommends that measurements be done on the actual systems for which results are claimed.

Nevertheless, SPEC recognizes that there is a cost of benchmarking, and that multiple publications from a single measurement may sometimes be appropriate.

For example, two systems badged as "Model A" versus "Model B" may differ only in the badge itself; in this situation, differences are sometimes described as only "paint deep", and a tester may wish to perform only a single test i.

Although paint is usually not a performance-relevant difference, for other differences it can be difficult to draw a precise line as to when two similar systems should no longer be considered equivalent.

For example, what if Model A and B come from different vendors? Use differing firmware, power supplies, or line voltage? Support additional types or numbers of disks, busses, interconnects, or other devices? For SPEC CPU, a single measurement may be published as multiple equivalent results provided that all of the following requirements are met:. Performance differences from factors such as those listed in the paragraph above paint, vendor, firmware, and so forth are within normal run-to-run variation.

Due to space and thermal considerations, the Model B can only be half-populated; i. If the actual sytem under test is the Model A, the tester must fill only the DIMM slots that are allowed to be filled for both systems. Disclosures must reference each other, and must state which system was used for the actual measurement. For example:. This result was measured on the Acme Model A. In addition, SPEC encourages use of this field to make it easier for the reader to identify a processor, even if the processor choice is not, technically, ambiguous.

For results that are published on its web site, SPEC is likely to use this field to note CPU technical characteristics that SPEC may deem useful for queries, and may adjust its contents from time to time.

Some processor differences may not be relevant to performance, such as differences in packaging, distribution channels, or CPU revision levels that affect a SPEC CPU overall performance metric by less than 1. In those cases, SPEC does not require disambiguation as to which processor was tested. For example, when first introduced, the TurboBlaster series is available with only one instruction set, and runs at speeds up to 2GHz.

Later, a second instruction set known as "Arch2" is introduced and older processors are commonly, but informally, referred to as having employed "Arch1", even though they were not sold with that term at the time. Chips with Arch2 are sold at speeds of 2GHz and higher. The manufacturer has chosen to call both Arch1 and Arch2 chips by the same formal chip name TurboBlaster.

Since the formal chip name is the same, and since both Arch1 and Arch2 are available at 2. In this case, there is technically no ambiguity, since all 2. Nevertheless, the tester is encouraged to note that the chip uses Arch2, to help the reader disambiguate the processors. As an aid to technical readers doing queries, SPEC may decide to adjust all the TurboBlaster results that have been posted on its website by adding either "Arch1" or "Arch2" to all posted results.

The 2. But these are both 2. In this case, it is not necessary to specify whether the OEM or Consumer version was tested. CPU MHz: a numeric value expressed in megahertz. That is, do not say "1. The value here is to be the speed at which the CPU is run, even if the chip itself is sold at a different clock rate.

That is, if you "over-clock" or "under-clock" the part, disclose here the actual speed used. Number of CPUs in System. As of early , it is assumed that processors can be described as containing one or more "chips", each of which contains some number of "cores", each of which can run some number of hardware "threads". Fields are provided in the results disclosure for each of these. If industry practice evolves such that these terms are no longer sufficient to describe processors, SPEC may adjust the field set.

Regarding the fields in the above list that mention the word "enabled": if a chip, core, or thread is available for use during the test, then it must be counted. If one of these resources is disabled - for example by a firmware setting prior to boot - then it need not be counted, but the tester must exercise due diligence to ensure that disabled resources are truly disabled, and not silently giving help to the result.

Each chip has 2 cores. Each core can run 4 hardware threads. Even though they are now only lightly loaded, all the above resources are still configured into the SUT; therefore the SUT must still be described as:.

The system is halted, and firmware commands are entered to disable all but 3 of the chips. All resources are available on the remaining 3 chips. The system is rebooted and a copy test is run once more. This time, the resources are:.

The system is halted, and firmware commands are entered to enable 24 chips; but only 1 core is enabled per chip, and hardware threading is turned off. The system is booted, and a copy test is run. The resources this time are:. Note: if resources are disabled, the method s used for such disabling must be documented and supported.

Number of CPUs orderable. Specify the number of processors that can be ordered, using whatever units the customer would use when placing an order. Level 1 primary Cache : Size, location, number of instances e. Performance relevant information as to the memory configuration must be included, either in the field or in the notes section.

If there is one and only one way to configure memory of the stated size, then no additional detail need be disclosed. But if a buyer of the system has choices to make, then the result page must document the choices that were made by the tester. For example, the tester may need to document number of memory carriers, size of DIMMs, banks, interleaving, access time, or even arrangement of modules: which sockets were used, which were left empty, which sockets had the bigger DIMMs.

Exception: if the tester has evidence that a memory configuration choice does not affect performance, then SPEC does not require disclosure of the choice made by the tester. If other disks are also performance relevant, then they must also be described. System State: On Linux systems with multiple run levels, the system state must be described by stating the run level and a very brief description of the meaning of that run level, for example:.

If the system is installed and booted using default options, document the System State as "Default". If the system is used in a non-default mode, document the system state using the vocabulary appropriate to that system for example, "Safe Mode with Networking", "Single User Mode". Note: some Unix and Unix-like systems have deprecated the concept of "run levels", preferring other terminology for state description.

In such cases, the system state field should use the vocabulary recommended by the operating system vendor. Note 1: It is acceptable for library functions e. If one or more library functions are used in this manner, that counts as auto parallelization, for purposes of this field.

Note 2: sometimes libraries are referred to as "thread safe" or "SMP safe" when implemented in a manner that allows multiple calling threads from a single process. Such an implementatation is not alone enough to require setting the field to "yes"; the point is whether the library routine itself causes multiple threads of work to be generated. Note 3: incidental operating system usage of hardware resources due to interrupt processing and system services does not count as "Auto Parallelization" for purposes of this field.

Of course, all available cpu resources must be disclosed, as described in rule 4. Scripted Installations and Pre-configured Software: In order to reduce the cost of benchmarking, test systems are sometimes installed using automatic scripting, or installed as preconfigured system images. A tester might use a set of scripts that configure the corporate-required customizations for IT Standards, or might install by copying a disk image that includes Best Practices of the performance community.

SPEC understands that there is a cost to benchmarking, and does not forbid such installations, with the proviso that the tester is responsible to disclose how end users can achieve the claimed performance using appropriate fields above.

Example: the Corporate Standard Jumpstart Installation Script has 73 documented customizations and undocumented customizations, 34 of which no one remembers. The tester is nevertheless responsible for finding and documenting all Therefore to remove doubt, the tester prudently decides that it is less error-prone and more straightforward to simply start from customer media, rather than the Corporate Jumpstart.

System Services: If performance relevant system services or daemons are shut down e. Incidental services that are not performance relevant may be shut down without being disclosed, such as the print service on a system with no printers attached.

The tester remains responsible for the results being reproducible as described. The meaning of the settings must also be described, in either the free form notes or in the flags file. The tuning parameters must be documented and supported. The recommended spelling for customers who wish to achieve the effect of the above command will be:. In this case, the flags report will include the actual spelling used by the tester, but a note should be added to document the spelling that will be recommended for customers.

Compilation flags are detected and reported by the tools with the help of "flag description files". Such files provide information about the syntax of flags and their meaning. Flags file required: A result will be marked "invalid" unless it has an associated flag description file. Flags description files are not limited to compiler flags. Although these descriptions have historically been called "flags files", flag description files are also used to describe other performance-relevant options.

Notes section or flags file? As mentioned above rule 4. In general, it is recommended that the result page should state what tuning has been done, and the flags file should state what it means. As an exception, if a definition is brief, it may be more convenient, and it is allowed, to simply include the definition in the notes section.

Required detail: The level of detail in the description of a flag is expected to be sufficient so that an interested technical reader can form a preliminary judgment of whether he or she would also want to apply the option. This requirement is phrased as a "preliminary judgment" because a complete judgment of a performance option often requires testing with the user's own application, to ensure that there are no unintended consequences. At minimum, if a flag has implications for safety, accuracy, or standards conformance, such implications must be disclosed.

When --algebraII is used, the compiler is allowed to use the rules of elementary algebra to simplify expressions and perform calculations in an order that it deems efficient. This flag allows the compiler to perform arithmetic in an order that may differ from the order indicated by programmer-supplied parentheses. The final sentence of the preceding paragraph is an example of a deviation from a standard which must be disclosed.

Description of Feedback-directed optimization : If feedback directed optimization is used, the description must indicate whether training runs:. Hardware performance counters are often available to provide information such as branch mispredict frequencies, cache misses, or instruction frequencies. If they are used during the training run, the description needs to note this; but SPEC does not require a description of exactly which performance counters are used.

As with any other optimization, if the optimizations performed have effects regarding safety, accuracy, or standards conformance, these effects must be described. Flag file sources: It is acceptable to build flags files using previously published results, or to reference a flags file provided by someone else e. Doing so does not relieve an individual tester of the responsibility to ensure that his or her own result is accurate, including all its descriptions.

SPEC CPU results are for systems, not just for chips: it is required that a user be able to obtain the system described in the result page and reproduce the result within a small range for run-to-run variation. For those suppliers, the performance-relevant hardware components typically are the cpu chip, motherboard, and memory; but users would not be able to reproduce a result using only those three.

To actually run the benchmarks, the user has to supply other components, such as a case, power supply, and disk; perhaps also a specialized CPU cooler, extra fans, a disk controller, graphics card, network adapter, BIOS, and configuration software. Such systems are sometimes referred to as "white box", "home built", "kit built", or by various informal terms.

For SPEC purposes, the key point is that the user has to do extra work in order to reproduce the performance of the tested components; therefore, this document refers to such systems as "user built". For user built systems, the configuration disclosure must supply a parts list sufficient to reproduce the result.

As of the listed availability dates in the disclosure, the user should be able to obtain the items described in the disclosure, spread them out on an anti-static work area, and, by following the instructions supplied with the components, plus any special instructions in the SPEC disclosure, build a working system that reproduces the result.

It is acceptable to describe components using a generic name e. Component settings that are listed in the disclosure must be within the supported ranges for those components. For example, if the memory timings are manipulated in the BIOS, the selected timings must be supported for the chosen type of memory.

For example, SPEC CPU benchmark scores are affected by memory speed, and motherboards often support more than one choice for memory; therefore, the choice of memory type is performance-relevant. By contrast, the motherboard needs to be mounted in a case. Which case is chosen in not normally performance-relevant; it simply has to be the correct size e. Performance-relevant components must be described in fields for "Configuration Disclosure" see rules 4.

If more detail is needed beyond what will fit in the fields, add more information under the free-form notes. Note 2: Regarding power modes: Sometimes CPU chips are capable of running with differing performance characteristics according to how much power the user would like to spend.

If non-default power choices are made for a user built system, those choices must be documented in the notes section. Note 3: Regarding cooling systems: Sometimes CPU chips are capable of running with degraded performance if the cooling system fans, heatsinks, etc.

When describing user built systems, the notes section must describe how to provide cooling that allows the chip to achieve the measured performance. It was mentioned in section 2 that it is allowed to build on a different system than the system under test. This section describes when and how to document such builds. If all components of the build environment are available for the run environment, and if both belong to the same product family and are running the same operating system versions, then this is not considered a cross-compilation.

The fact that the binaries were built on a different system than the run time system does not need to be documented. If the software used to build the benchmark executables is not available on the SUT, or if the host system provides performance gains via specialized tuning or hardware not available on the SUT, the host system s and software used for the benchmark building process must be documented.

Sometimes, the person building the benchmarks may not know which of the two previous paragraphs apply, because the benchmark binaries and config file are redistributed to other users who run the actual tests. In this situation, the build environment must be documented. The actual test results consist of the elapsed times and ratios for the individual benchmarks and the overall SPEC metric produced by running the benchmarks via the SPEC tools.

The required use of the SPEC tools ensures that the results generated are based on benchmarks built, run, and validated according to the SPEC run rules.

All runs of a specific benchmark when using the SPEC tools are required to have validated correctly. The benchmark executables must have been built according to the rules described in section 2 above. The SPECrate throughput metrics are calculated based on the execution of benchmark binaries that are built using the same rules as binaries built for SPECspeed metrics.

However, the tester may select the number of concurrent copies of each benchmark to be run. The same number of copies must be used for all benchmarks in a base test. This is not true for the peak results where the tester is free to select any combination of copies.

The number of copies selected is usually a function of the number of CPUs in the system. As with the SPECspeed metric, all copies of the benchmark during each run are required to have validated correctly.

The reverse is also permitted. As mentioned above , performance may sometimes change for pre-production systems; but this is also true of production systems that is, systems that have already begun shipping. For example, a later revision to the firmware, or a mandatory OS bugfix, might reduce performance.

For production systems, if the tester becomes aware of something that reduces performance by more than 1. In such cases, the original result is not considered non-compliant. The tester is also encouraged, but not required, to include a reference to the change that makes the results different e. Publication of peak results are considered optional by SPEC, so the tester may choose to publish only base results.

Since by definition base results adhere to all the rules that apply to peak results, the tester may choose to refer to these results by either the base or peak metric names e. It is permitted to publish base-only results. Operation on Windows is substantially similar; just provide the relative paths with backslashes instead of forward slashes.

Please submit the resulting tarfile to SPEC for review, along with the recording of your tool build session. SPEC will review your tools, and assuming that they pass review, will add the tools you have built to its patch library, for possible distribution to future users of your interesting new architecture. NOTE 1: If your operating system is unable to execute the packagetools script, please have a look at what the script does and enter the corresponding commands by hand.

Again, you will need to submit the results to SPEC. NOTE 2: Be sure to test your packaged tools on a different system, preferably one with a different disk layout. If the destination system is unable to invoke libperl. Here is an example use of packagetools. In the example below, notice that:. Testing a newly-built toolset on the system where it was built is not enough to ensure basic sanity of the tools.

Test for unintended dependencies by installing on an entirely different system. For example:. If something goes wrong, unfortunately, you're probably just going to have to take it apart and figure out what. Here are some hints on how to go about doing that. If something goes wrong, you probably do NOT want to make some random adjustment like: reinstall a compiler, fix an environment variable, or adjust your path and start all over again.

That's going to be painful and take a lot of your time. Instead, you should temporarily abandon the buildtools script at that point and just try to build the offending tool, until you understand exactly why that particular tool is failing. Consider turning on verbose diagnostics if your system has a way to do that. Make a huge terminal window e.

Read what buildtools or buildtools. For example, you might do something like this:. Now, try fixing that environment variable or reinstalling that compiler, and rebuild the single tool. Does it look better? If not, have a close look at the error messages and the Makefile. Does the Makefile use a feature that is not present in your version of make? If so, can you get it to work with GNU make? Note that for GNU configure based tools everything except Perl and its modules you may specify your compiler by setting the CC environment variable.

If you want to see more about what buildtools is doing for you, turn on your shell's verbose mode. Y of a tool and it just won't build on your operating system, you might check whether there is a new Version X. If so, download the new version to a scratch directory outside of the SPEC tree and try building it there. If that version succeeds, try to deduce why.

Narrow it down to a one-line fix, won't you please? Then tell SPEC that you'd like the same one-line fix applied to its variant of the tool. Bookmark this article. You can see your Bookmarks on your DeepDyve Library. Sign Up Log In. Copy and paste the desired citation format or use the link below to download a file formatted for EndNote. All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website.

You can change your cookie settings through your browser. Open Advanced Search. DeepDyve requires Javascript to function. Please enable Javascript on your browser to continue. Read Article. Download PDF. Share Full Text for Free.



0コメント

  • 1000 / 1000