CPU2.0
General Purpose Library for x64 CPU


CPU2.0 is a low-level helper library offering basic routines needed by assembly programmers to interface the x64 CPU in a low-level, controlled, command-line environment. It is specifically tailored for beginners, hobbyists and self-learners of x64 assembly language programming with the purpose of reducing the steep learning curve. CPU2.0 is exposing the CPU state at any point of insertion from a programmers source with extreme accuracy of the machine state. This makes CPU2.0 and 10 euro bonus ohne einzahlung casino a very effective set of tools for learning, teaching and other system purposes. Here you can find the best slots with 10£ free no deposit casino

Revision
Download
Donate*

1.0.5
Date: 20.DEC.2018

For older versions (BASELIB/CPULIB, Revision 4.1.6), you can download them here
baselibs | cpulib
(Note these two older libraries are no longer maintained)

*If you find our libraries helpful, please consider donating to our effort to maintain, enhance and improve this library.

CPU2.0 comes in two flavors:

  • CPU2.0 For Linux64 (cpu2.o, cpu2.so.1.0) – Are for Linux64 platform observing System V AMD64 ABI and calling convention. Contains about 260 helper routines.
  • CPU2.0 For Win64 (cpu2.obj, cpu2.dll) – Are for Win64 platform observing MS64 ABI and calling convention. Contains about 260 helper routines.
  • Both versions of CPU2.0 have similar functionalities but with different calling conventions. CPU2.0 is also accessible from high-level languages such as C and C++ operating in 64-bit mode and on x64 CPU.

    The following are multiple set of examples demonstrating how to access and use CPU2.0 via various assemblers and linkers on both Win64 and Linux64 platforms.


    Examples: Vector Applications


    A strong point of CPU2.0 is the ability to deal with vector SIMD (MMX,SSE,AVX) instructions up to byte-granularity operations. This is the only low-level library that allows you to actively view how such operations are exposed in such a gory detail and steps.

    For instance, you want to see how saturated arithmetics is implemented using MMX saturated instructions paddusb. For a Win64 example using FASM(assembler) GoLink(linker), and cpu2.dll from CPU2.0

        ;Compile >> fasm this.asm
        ;Link >> golink /console this.obj cpu2.dll
        format MS64 COFF
        public start
    
        extrn dumpmmxf
        extrn prnline
        extrn exitx
    
        section '.data' data readable writeable
        x db 254,0,254,127,254,254,253,250
        y db 3,255,2,129,8,-2,4,6
    
        section '.text' code readable executable
        start:
                sub     rsp,40
    
                movq    mm0,qword[x]  ;populate MM0
                movq    mm7,qword[y]  ;populate MM7
    
                ;Initial view
                mov     rcx,0         ;view as unsigned bytes
                call    dumpmmxf      ;view formatted (aligned layout)
                call    prnline
    
                ;saturated add against packed unsigned bytes
                paddusb mm0,mm7
    
                ;after unsigned saturated arithmetics
                mov     rcx,0
                call    dumpmmxf
    
                call    exitx

    Should yield this output where bytes in MM0 retain their maximum values after being added with some random values from MM7. The output;

        mm0: 250|253|254|254|127|254|  0|254|	;Initial MMX registers
        mm1:   0|  0|  0|  0|  0|  0|  0|  0|
        mm2:   0|  0|  0|  0|  0|  0|  0|  0|
        mm3:   0|  0|  0|  0|  0|  0|  0|  0|
        mm4:   0|  0|  0|  0|  0|  0|  0|  0|
        mm5:   0|  0|  0|  0|  0|  0|  0|  0|
        mm6:   0|  0|  0|  0|  0|  0|  0|  0|
        mm7:   6|  4|254|  8|129|  2|255|  3|
    
        mm0: 255|255|255|255|255|255|255|255|	;After saturated add, mm0=mm0+mm7
        mm1:   0|  0|  0|  0|  0|  0|  0|  0|
        mm2:   0|  0|  0|  0|  0|  0|  0|  0|
        mm3:   0|  0|  0|  0|  0|  0|  0|  0|
        mm4:   0|  0|  0|  0|  0|  0|  0|  0|
        mm5:   0|  0|  0|  0|  0|  0|  0|  0|
        mm6:   0|  0|  0|  0|  0|  0|  0|  0|
        mm7:   6|  4|254|  8|129|  2|255|  3|

    Or maybe MMX instructions are too outdated for you? Perhaps you want to see how square root operations can be carried out simultaneously against four single-precisions at once via SSE sqrtps instruction. CPU2.0 is there for you. Using FASM, ld linker and cpu2.o (Linux64 version) as the main library;

        ;------------------------------
        ; fasm this.asm
        ; ld this.o cpu2.o –o this
        ; ./this
        ;------------------------------
        format ELF64
        public _start
    
    
        extrn dumpxmmrf
        extrn prnline
        extrn exitx
    
    
        section '.data' writeable align 16
        align 16
        x dd 45.45,81.0,34.23,42.01
    
    
        section '.text' executable
        _start:
                sub     rsp,8		;alignment to 16
    
                movdqu  xmm0,dqword[x]  	;populate XMM0
    
                ;Initial view
                mov     rdi,8         	;view as packed singles
                call    dumpxmmrf     	;view XMMs in reversed, formatted
                call    prnline
    
                ;SQRTPS at work
                sqrtps  xmm1,xmm0
    
                ;See the result, in XMM1
                mov     rdi,8
                call    dumpxmmrf
    
                call    exitx

    Output (reversed, formatted)

         xmm0:        +45.45|        +81.0|       +34.23|       +42.01|
         xmm1:          +0.0|         +0.0|         +0.0|         +0.0|
         xmm2:          +0.0|         +0.0|         +0.0|         +0.0|
         xmm3:          +0.0|         +0.0|         +0.0|         +0.0|
         xmm4:          +0.0|         +0.0|         +0.0|         +0.0|
         xmm5:          +0.0|         +0.0|         +0.0|         +0.0|
        --snip--
    
         xmm0:        +45.45|        +81.0|       +34.23|       +42.01|
         xmm1:     +6.741661|         +9.0|    +5.850641|    +6.481512|
         xmm2:          +0.0|         +0.0|         +0.0|         +0.0|
         xmm3:          +0.0|         +0.0|         +0.0|         +0.0|
         xmm4:          +0.0|         +0.0|         +0.0|         +0.0|
         xmm5:          +0.0|         +0.0|         +0.0|         +0.0|
        --snip--

    Executing the same code setting in Win64, using ld and cpu.dll, and this time with forward view of XMMs instead of reversed view, yields;

        ;------------------------------
        ; fasm this.asm
        ; ld this.obj cpu2.dll –o this
        ;------------------------------
        format MS64 COFF
        public _start
    
        extrn dumpxmmf
        extrn prnline
        extrn exitx
    
        section '.data' readable writeable align 16
        align 16
        x dd 45.45,81.0,34.23,42.01
    
        section '.text' readable executable
        _start:
                sub     rsp,40
    
                movdqa  xmm0,dqword[x]  	;populate XMM0
    
                ;Initial view
                mov     rcx,8         	;view as packed singles
                call    dumpxmmf     	;view formatted
                call    prnline
    
                ;SQRTPS at work
                sqrtps  xmm1,xmm0
    
                ;See the result, in XMM1
                mov     rcx,8
                call    dumpxmmf
    
                call    exitx

    Output (forward and formatted)

         xmm0:        +42.01|       +34.23|        +81.0|       +45.45|
         xmm1:          +0.0|         +0.0|         +0.0|         +0.0|
         xmm2:          +0.0|         +0.0|         +0.0|         +0.0|
         xmm3:          +0.0|         +0.0|         +0.0|         +0.0|
         xmm4:          +0.0|         +0.0|         +0.0|         +0.0|
         xmm5:          +0.0|         +0.0|         +0.0|         +0.0|
         --snip--
    
         xmm0:        +42.01|       +34.23|        +81.0|       +45.45|
         xmm1:     +6.481512|    +5.850641|         +9.0|    +6.741661|
         xmm2:          +0.0|         +0.0|         +0.0|         +0.0|
         xmm3:          +0.0|         +0.0|         +0.0|         +0.0|
         xmm4:          +0.0|         +0.0|         +0.0|         +0.0|
         xmm5:          +0.0|         +0.0|         +0.0|         +0.0|
        --snip—

    Or maybe, you want to know whether your PC does support AVX instruction sets, by trying to shuffle a few packed doubles? Now using ML64, GCC64 and cpu2.obj

        ;------------------------------
        ; ml64 /c this.asm
        ; gcc -m64 this.obj cpu2.obj -s -o this
        ;------------------------------
        option casemap:none
    
        externdef dumpymm:proc          ;These are from cpu2.obj
        externdef prnline:proc
        externdef ExitProcess:proc      ;This is from kernel32.dll
    
        dseg segment page 'DATA'
        align 32
        x dq 204.561,891.211,-23.892,102.389
        y dq 150.167,170.188,290.313,310.324
        dseg ends
    
        cseg segment para 'CODE'
    
        main proc
                sub     rsp,40
                vmovdqu ymm0,ymmword ptr x  ;populate YMM0
                vmovdqu ymm1,ymmword ptr y  ;populate YMM1
    
                ;Initial view
                mov     rcx,9        	   ;view as packed doubles
                call    dumpymm      	   ;view unformatted,forward
                call    prnline
    
                ;Packed doubles shuffling
                vshufpd ymm5,ymm1,ymm0,2
    
                ;See the shuffled result, in YMM5
                mov     rcx,9
                call    dumpymm
    
                xor     ecx,ecx
                call    ExitProcess
        main endp
    
        cseg ends
        end

    Output (unformatted)

         ymm0: 102.389|-23.892|891.211|204.561|
         ymm1: 310.324|290.313|170.188|150.167|
         ymm2: 0.0|0.0|0.0|0.0|
         ymm3: 0.0|0.0|0.0|0.0|
         ymm4: 0.0|0.0|0.0|0.0|
         ymm5: 0.0|0.0|0.0|0.0|
         ymm6: 0.0|0.0|0.0|0.0|
        --snip--
    
         ymm0: 102.389|-23.892|891.211|204.561|
         ymm1: 310.324|290.313|170.188|150.167|
         ymm2: 0.0|0.0|0.0|0.0|
         ymm3: 0.0|0.0|0.0|0.0|
         ymm4: 0.0|0.0|0.0|0.0|
         ymm5: -23.892|290.313|891.211|150.167|
         ymm6: 0.0|0.0|0.0|0.0|
        --snip--

    The last four programs above demonstrated how CPU2.0 offers such a great flexibility in viewing any MMX, SSE or AVX registers in any direction and layout, in multiple magnitudes, and in formatted / unformatted views. CPU2.0 is the first to offer this kind of flexibility. It also demonstrated that CPU2.0 can run seamlessly on both platforms due to its low-level nature.


    Examples: Floating-Point and FPU


    While previous examples deal with floating-point from application standpoint, CPU2.0 also offers routines with floating-point educational materials and thus providing a full set of FPU helper routines. This is implemented via many routines such as fpdinfo, fpfinfo, fpu_stack, fpbind, fpbin, fpu_sflag, fpu_tag, fpu_cflag and the likes. In the following examples, one can see how the CPU, which is an IEEE-754 compliant, maintains and sees both double-precision and single-precision data in their three-part floating point binary format. Those parts are;

  • One sign bit
  • Exponent bits
  • Mantissa/Significant bits
  • In addition, there’s also a hidden bit implicitly embedded in a floating point quantity of the CPU/FPU. All these information are revealed via both fpdinfo (for double-precision) and fpfinfo (for single-precision). Example below is for Win64, via NASM, using ld as the linker and using our library cpu2.dll.

        ;------------------------
        ; nasm -f win64 this.asm
        ; ld this.obj cpu2.dll -o this
        ;------------------------
                global _start
    
                ;import these from cpu2.dll
                extern fpdinfo
                extern fpfinfo
                extern prnstrz
                extern prnline
                extern exitx
    
                section .data
        msg1:   db 'IEEE 754 float64 format for: ',0ah,0
        msg2:   db 'IEEE 754 float32 format for: ',0ah,0
        myDbl:  dq -786.561        ;a REAL8
        myFlt:  dd 0.009239123     ;a REAL4
    
                section .text
        _start:
                sub     rsp,40
    
                mov     rcx,msg1        ;Display message
                call    prnstrz
                movq    xmm0,[myDbl]    ;View real8 FP format
                call    fpdinfo
    
                call    prnline
    
                mov     rcx,msg2
                call    prnstrz
                movd    xmm0,[myFlt]    ;View real4 FP format
                call    fpfinfo
    
                call    exitx

    Output:

        IEEE 754 float64 format for:
        -786.561 = C088947CED916873
        1.10000001000.1000100101000111110011101101100100010110100001110011
        S.EXPONT   +1.MANT
        SIGN: 1
        EXP : +9 (1032 - 1023)
        MANT: 1.536251953124999
    
        IEEE 754 float32 format for:
        0.009239 = 3C175FB1
        0.01111000.00101110101111110110001
        S.EXPONT 1.MANT
        SIGN: 0
        EXP : -7 (+120 - 127)
        MANT: 1.182608

    All these technical information revealed by those two routines are important in floating-point binary study.

    For example, the nett exponent tells how many bits off the Mantissa part that are used to idenify the integral part of your floating-point quantity, plus the hidden bit.

    From the example above, the double value -786.561 has the nett exponent +9. That means the highest 9 bits, plus 1 hidden bit should sum up to integer 786 while the rest of the bits are used to represent the fraction part.

        0.100010010....Higher 9 bits of mantissa
        1		   +1 hidden bit
        1 100010010b   Sums up to 786

    Now that the secret is out, how about learning some more about the FPU instructions and non-comformant floating point such as a REAL10? Example below demonstrates how one can learn multiple FPU operations with the help of fpu_stack routine. Using Linux64, FASM and cpu2.o as the library;

        ;------------------------
        ; fasm this.asm
        ; ld this.o cpu2.o -o this
        ;------------------------
                format ELF64
                public _start
    
                ;import these from cpu2.o
                extrn fpu_stack
                extrn fpu_sflag
                extrn prnline
                extrn exitx
    
                section '.data' writeable
        myExt   dt 56.154		;a REAL10 value
    
                section '.text' executable
        _start:
                sub     rsp,8      ;align stack
    
                finit              ;clear/init FPU
    
                fld     [myExt]    ;Load a REAL10
                fldpi              ;Load an FPU PI constant
                fldz               ;Load an FPU zero
                call    fpu_stack	 ;View FPU stack after loading some values
                call    prnline
    
                ;FDIV operation to produce an infinity (Div-by-Zero)
                fdiv    st2,st0
                call    fpu_stack  ;See what happens in ST2
                call    prnline
    
                ;View the FPU status flag
                call    fpu_sflag  ;Watch the 'Z' (div-by-zero) flag is up
    
                call    exitx

    Output:

        st0: +0.000000000000000000
        st1: +3.141592653589793238
        st2: +56.15400000000000000
        st3: ...
        st4: ...
        st5: ...
        st6: ...
        st7: ...
    
        st0: +0.000000000000000000
        st1: +3.141592653589793238
        st2: -Inf-
        st3: ...
        st4: ...
        st5: ...
        st6: ...
        st7: ...
    
        B C3 TP TP TP C2 C1 C0 IR SF  P  U  O  Z  D  I
        0  0  1  0  1  0  0  0  0  0  0  0  0  1  0  0 =2804

    This way, the users of CPU2.0 do not have to run back and forth the debugger to explore deep into the dynamics of FPU instructions and technicalities. One can learn, modify, program and see all of them on-the-fly, as the codes progress. This is the most effective way of learning and even teaching where everything can be built up and modified quickly without going through the time-consuming process of debugging.


    Examples: Memory Viewing and Manipulation


    CPU2.0 comes with 5 memory viewing routines plus some other memory-related routines. They are memviewn, memview, memviewc, memviewb and finally stackview. All these offer positive and negative offsets so to enable you to navigate through the selected memory space up and down. The listed offsets are accurate so that you can use them in building valid addressing modes to access (read or write) any portion of memory area allowable for you to manipulate. memviewn and memviewb offers Little-Endian views so that you see how the CPU perceives information from memory.

    An example on Win64, via NASM and GCC64

        ;----------------------------------
        ;Win64 Compile >> nasm –f win64 this.asm
        ;Win64 Link    >> gcc –m64 this.obj cpu2.dll –s –o this.exe
        ;---------------------------------
        global main
        extern memviewc		;import these from cpu2.dll
        extern prnstrz
        extern prnline
    
        section .data
        msg: db ‘Hello CPA’,0ah,0
    
        section .code
        main:
            sub rsp,40		  ;Shadow space and alignment
            
            mov rdx,100		  ;view 100 bytes
            mov rcx,msg		  ;View this memory area.
            call memviewc 	          ;view as string bytes. Reversed Little-Endian
    
        	;memory manipulation. Change ‘A’ in ‘CPA’ to ‘U’
           	mov byte[rcx+8],’U’ 	  ;From the output, ‘A’ is at offset 8
    
       	call prnline
    
    
        	;Now watch it again.
       	mov rdx,100
           	mov rcx,msg
           	call memviewc 		;See ‘A’ is changed to ‘U’
    
    
        	;For proof, print the msg
           	mov rcx,msg
       	call prnstrz
    
       	add rsp,40
       	ret

    The output:

        0000000000404010| H e l l o _ C P A _ _ _ _ _ _ _ |+F|+15
        0000000000404020| p - @ _ _ _ _ _ _ _ _ _ _ _ _ _ |+1F|+31
        0000000000404030| _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |+2F|+47
        0000000000404040| _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |+3F|+63
        0000000000404050| _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |+4F|+79
        0000000000404060| _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |+5F|+95
        0000000000404070| _ + @ _                         |+63|+99
    
        0000000000404010| H e l l o _ C P U _ _ _ _ _ _ _ |+F|+15
        0000000000404020| p - @ _ _ _ _ _ _ _ _ _ _ _ _ _ |+1F|+31
        0000000000404030| _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |+2F|+47
        0000000000404040| _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |+3F|+63
        0000000000404050| _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |+4F|+79
        0000000000404060| _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |+5F|+95
        0000000000404070| _ + @ _                         |+63|+99
        Hello CPU

    Or perhaps you want to see on-the-fly how your stack and TOS is changing after pushing two quadword items, to learn about the stack;

        ;------------------------
        ; ml64 /c this.asm
        ;------------------------
        ; gcc -m64 this.obj cpu2.obj -o this.exe
        ;------------------------
        option casemap:none
        externdef stackview:proc   	;import these from cpu2.obj
        externdef prnline:proc
    
        .data
        x dq 45h
        y dq 77h
    
        .code
        main proc
    
            ;alignment and shadow space
            sub rsp,40
    
            ;Original stack before adding two items
            mov rcx,5     ;see 5 items of stack
            call stackview
            call prnline
    
            ;push two items onto the stack
            push x
            push y
    
            ;Stack after two "push"
            mov rcx,5
            call stackview
    
            add rsp,56	;Restore all stack
            ret
    
        main endp
        end

    Output:

        0000000000000000 |000000000061FE78	;First stackview
        000000000040FA20 |000000000061FE70
        0000000000000005 |000000000061FE68
        0000000000000000 |000000000061FE60
        00000000004013E8 |000000000061FE58
    
        0000000000000005 |000000000061FE68	;Second stackview after two PUSHes
        0000000000000000 |000000000061FE60
        00000000004013E8 |000000000061FE58	;Old TOS
        0000000000000045 |000000000061FE50
        0000000000000077 |000000000061FE48	;New TOS (RSP=0x61FE48, TOS=77h)

    Observe that as per convention, CPU2.0 shows the stack grows downwards and shrinks upwards the memory space in order to preserve the semantics of push and pop stack operations of the x86 architecture. Or just maybe you’re on Linux64 right now and want to see how does a decimal number 1234567 is perceived by the CPU in memory. Using NASM, on Linux64 and cpu2.o binary

        ;------------------------
        ; nasm -f elf64 this.asm
        ; ld this.o cpu2.o -o this
        ;------------------------
                global _start
                extern memviewb    ;from cpu2.o
                extern prnline
    
                section .data
        myVal:  dq 1234567         ;an 8-byte value
    
                section .text
        _start:
                mov     rdx,1      ;Option: view in decimal format
                mov     rsi,10     ;view ten bytes anyway
                mov     rdi,myVal  ;view at this address
                call    memviewb   ;View as flat Little-Endian
    
                call    prnline
    
                xor     edi,edi    ;Exit to terminal
                mov     eax,60
                syscall

    Output (in 10 flat bytes):

        000 |00000000006000DD
        000 |00000000006000DC
        000 |00000000006000DB	;High address
        000 |00000000006000DA
        000 |00000000006000D9
        000 |00000000006000D8	;a sequence of 8 bytes (quadword,DQ)
        000 |00000000006000D7
        018 |00000000006000D6
        214 |00000000006000D5
        135 |00000000006000D4	;Low address

    You must be surprised to see that a decimal 1234567 does not even look like 1234567 in flat eight bytes sequence of memory, even when represented in decimal format.

    mem_load routine lets you load a content of a file to memory for whatever reasons. You can then use other supporting routines such memview, memviewc or memviewn (to display the memory content) or prnstrz (if the loaded file is a text file) or use mem_tofile to save the content to a new file for future references.

    Using ML64, GCC64 and cpu2.dll you can extend your creativity via mem_load routine and supporting others. In this example, you can load and view the content of our own cpu2.dll. (Warning: Long output)

        ;----------------------------------
        ;Win64 Compile >> ml64 /c this.asm
        ;Win64 Link    >> gcc -m64 this.obj cpu2.dll
        ;----------------------------------
        option casemap:none
    
        externdef mem_load:proc       ;import these from cpu2.dll
        externdef mem_free:proc
        externdef memview:proc
        externdef ExitProcess:proc    ;import this from kernel32.dll
    
        .data
        myFile db 'cpu2.dll',0        ;load this to mem, and see content
    
        .code
        main proc
    
                sub rsp,40
    
                mov rcx,offset myFile
                call mem_load   ;Return size in RDX. Pointer in RAX
    
                mov r8,rax      ;Duplicate pointer
    
                                ;Size is already in RDX (2nd arg)
                mov rcx,rax     ;use pointer
                call memview    ;View content of entire "cpu2.dll"
    
                mov rcx,r8
                call mem_free   ;Free memory from "mem_load"
    
                xor ecx,ecx
                call ExitProcess
    
        main endp
        end

    Sample output (somewhere in the middle of such long output):

        ...
        000000000002A590| 65 72 73 65 00 73 74 72 |+A597|+42391
        000000000002A598| 5F 74 72 69 6D 00 73 74 |+A59F|+42399
        000000000002A5A0| 72 5F 77 6F 72 64 63 6E |+A5A7|+42407
        000000000002A5A8| 74 00 73 74 72 5F 74 6F |+A5AF|+42415
        000000000002A5B0| 6B 65 6E 00 73 74 72 5F |+A5B7|+42423
        000000000002A5B8| 66 69 6E 64 00 73 74 72 |+A5BF|+42431
        000000000002A5C0| 5F 66 69 6E 64 7A 00 73 |+A5C7|+42439
        000000000002A5C8| 74 72 5F 61 70 70 65 6E |+A5CF|+42447
        000000000002A5D0| 64 7A 00 73 74 72 5F 61 |+A5D7|+42455
        ...

    The two right-most colums are the offsets, both in hex and decimal respectively to let you reference or pinpoint to any particular byte you’re interested in viewing in your addressing modes. They are precise.


    Examples: Converters


    CPU2.0 is very rich in string conversion routines, either ways. You can practically convert from any base to another base, up to base-36. All of the number display routines (prnxxx) are basically some of them. On the other hand, each of them have their (xxx2str) counterparts so that you can choose your own string display functions from other sources much later instead of using the defaults provided by CPU2.0.

    For example, to display a double-precision, CPU2.0 uses prndbl and prndble functions. In addition, you can use their 2str (to string) counterparts, namely, dbl2str and dble2str to do similar jobs then display them much later using any string display services you know, such as C’s printf. (Please note that CPU2.0 does not use any C library)

    Example below demonstrates how a user input is converted to Base-19 and Base-25 strings using int2str and bconv converter routines respectively. To make it more fun, we use an interactive program via FASM, linker ld, cpu2.o and on Linux64. Observe the calling convention used;

        ;------------------------
        ; fasm this.asm
        ; ld this.o cpu2.o -o this
        ;------------------------
                format ELF64
                public _start
    
                ;import these from cpu2.o
                extrn prnstrz
                extrn readint
                extrn int2str
                extrn bconv
                extrn prnline
    
                section '.data' writeable
        prompt1 db 'Enter an integer to convert: ',0
        msg19   db 'Your integer in Base-19 is: ',0
        msg25   db 'Your integer in Base-25 is: ',0
        val1    dq 0
        buff    rb 80
    
                section '.text' executable
        _start:
                sub     rsp,8
    
                mov     rdi,prompt1     ;Display prompt
                call    prnstrz
                call    readint         ;Get user input
                mov     [val1],rax      ;copy to val1
    
                mov     rcx,0           ;Option. Unsigned
                mov     rdx,19          ;Use Base-19
                mov     rsi,buff        ;Buffer to keep the string
                mov     rdi,[val1]      ;The value to convert
                call    int2str
    
                mov     rdi,msg19       ;Display msg19
                call    prnstrz
                mov     rdi,buff        ;Display the returned string
                call    prnstrz
    
                call    prnline
    
                mov     rdi,msg25       ;Display msg25
                call    prnstrz
                mov     rsi,25          ;Use Base-25
                mov     rdi,[val1]      ;The value to convert
                call    bconv
    
        done:   call    prnline
                xor     edi,edi
                mov     eax,60
                syscall

    Output (A user input a Hex value)

        Enter an integer to convert: C334EFh
        Your integer in Base-19 is: 5332GA
        Your integer in Base-25 is: 17IIML

    On the other hand, one can also convert a null-terminated string to its integer or float counterpart. This can be done via str2int, str2dbl, str2flt functions and then display the converted value any way necessary, either as signed, unsigned or to any other number format. Using NASM, on Win64, cpu2.obj and GCC64 as the linker;

        ;------------------------
        ; nasm -f win64 this.asm
        ; gcc -m64 this.obj cpu2.obj -s -o this.exe
        ;------------------------
                global main			    ;GCC entry point
    
                ;import these from cpu2.obj
                extern str2int
                extern str2flt
                extern prnhexu
                extern prnoctu
                extern prnflt
                extern prnline
    
                section .data
        iVal:   db '234566o',0               ;An unsigned octal integer
        xVal:   db '-234566o',0              ;A signed octal
        fVal:   db '0.0000000000000031265',0 ;a very small float
    
                section '.text' executable
        main:
                sub     rsp,40
    
                ;Convert string to integer
                mov     rcx,iVal  ;An octal string
                call    str2int   ;return integer in RAX
                mov     rcx,rax
                add     rcx,1     ;Proof that returned value is an integer
                call    prnoctu   ;Display as unsigned octal
    
                call    prnline
    
                ;Convert string to integer
                mov     rcx,xVal  ;Another octal string
                call    str2int   ;return integer in RAX
                mov     rcx,rax
                call    prnhexu   ;Display/convert to/as unsigned Hex
    
                call    prnline
    
                ;Convert string to Float
                mov     rcx,fVal
                call    str2flt   ;Return single-precision in XMM0
                call    prnflt    ;Display a float in XMM0
    
        done:   add     rsp,40
                ret

    Output:

        234567
        FFFFFFFFFFFEC68A
        3.126499E-15

    Examples: Registers Dump


    The heart of mastering assembly programming is the constant awareness around operands and registers, coupled with the effective use of register viewing routines. CPU2.0 is well-aware of that. For this very reasons, CPU2.0 offers various register dumping routines such as

  • dumpreg - View all 64-bit registers in Unsigned Hexadecimal
  • dumpregd - View all 64-bit registers in Signed integer format
  • dumpregdu - View all 64-bit registers in Unsigned integer format
  • dumprego - View all 64-bit registers in Signed octal format
  • dumpregou - View all 64-bit registers in Unsigned octal format
  • dumpregb - View all 64-bit registers in Unsigned binary format
  • flags - View the CPU’s RFLAG status

    These registers are very useful for the users that deal with various numeral systems, such as octal, to perform complex integer arithmetic and bitwise operations. CPU2.0 allows the users to call any of these routines at any point where they are needed and can be called repetitively without limitations.

    For example, to see how the bitwise AND instruction works against RAX register in order find out the next lower modulo 16 value of an operand (Win64, GoLink, ML64, cpu2.dll);

        ;------------------------
        ; ml64 /c this.asm
        ; golink /console this.obj cpu2.dll -o this
        ;------------------------
        option casemap:none
    
        ;import these from cpu2.dll
        externdef dumpreg:proc
        externdef exitx:proc
    
        .code
        start proc
                sub     rsp,40      	;align stack
                mov     rax,34567h
                and     rax,-16         	;even down to modulo 16
                call    dumpreg		;See change in RAX, as hex.
                call    exitx
        start endp
        end

    Output:

        RAX|0000000000034560 RCX|0000000000222000 RDX|0000000000401000
        RBX|0000000000000000 R8 |0000000000222000 R9 |0000000000401000
        RDI|0000000000000000 RSI|0000000000000000 R10|0000000000000000
        R11|0000000000000000 R12|0000000000000000 R13|0000000000000000
        R14|0000000000000000 R15|0000000000000000 RBP|0000000000000000
        RSP|000000000014FF50 RIP|000000000040100F [UHEX]

    The symbol [UHEX] is an added feature serving as a constant reminder that you’re currently viewing the registers in Hexadecimal format, and not in any other format. This is important because numbers of different format may appear the same in certain cases.

    The output in RAX shows that your value is now even down to the nearest modulo 16 (divisible by 16) via the and rax,-16 instruction. This is a familiar technique mostly found in string length, fast memory copying routines and even stack/memory alignment.

    Similarly, a beginner can quickly learn and build confidence dealing with signed and unsigned binaries by using dumpregb. For example, to see how the CPU stores negative numbers in Two’s complement format, two of the techniques are using NEG and NOT instructions. Using NASM, GCC64 linker, cpu2.dll, one can quickly build up two test-cases and see the results instantly without involving any unproductive guesswork.

        ;------------------------
        ; nasm -f win64 this.asm
        ; gcc -m64 this.obj cpu2.obj -s -o this.exe
        ;------------------------
                global main
    
                ;import these from cpu2.dll/cpu2.obj
                extern dumpregb
                extern prnline
    
                section .text
        main:
                sub     rsp,40
    
                mov     rax,45      ;a positive integer
                mov     rbx,45	  ;The same value
                call    dumpregb    ;View RAX and RBX in unsigned binaries
    
        	 ;Using NEG against RAX
                neg     rax         ;Negate RAX. Two's complement (negative) form of +45.
                call    prnline
    
        	 ;Using NOT against RBX
                not     rbx         ;Invert RBX. One's Complement form
                add     rbx,1       ;+1. Two's complement (negative form) of +45
                call    dumpregb    ;View as unsigned bits. Watch RAX and RBX
    
                add     rsp,40
                ret

    From the output below, one can immediately see that both RBX and RAX are having the same values, that is in Two's Complement form to represent the negative value of 45

        RAX|00000000 00000000 00000000 00000000 00000000 00000000 00000000 00101101
        RCX|00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001
        RDX|00000000 00000000 00000000 00000000 00000000 00011110 00010011 01010000
        R8 |00000000 00000000 00000000 00000000 00000000 00011110 01000011 10110000
        R9 |00000000 00000000 00000000 00000000 00000000 00011110 00010011 01010000
        RBX|00000000 00000000 00000000 00000000 00000000 00000000 00000000 00101101
        --snip--
    
        RAX|11111111 11111111 11111111 11111111 11111111 11111111 11111111 11010011
        RCX|00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001
        RDX|00000000 00000000 00000000 00000000 00000000 00011110 00010011 01010000
        R8 |00000000 00000000 00000000 00000000 00000000 00011110 01000011 10110000
        R9 |00000000 00000000 00000000 00000000 00000000 00011110 00010011 01010000
        RBX|11111111 11111111 11111111 11111111 11111111 11111111 11111111 11010011
        --snip—

    On the same note, if you're dealing with lots of signed and unsigned integer operations, you can use dumpregd and dumpregdu respectively and enjoying the same accuracy.


    Examples: System Probe


    Beside being educational, CPU2.0 is also strong in system works and very handy when used in reverse engineering, via expert use of various CPU2.0 routines. For example, the memviewc is also capable of viewing other memory area using negative offsets, such as to see the MZ header of your own executable. An example on Win64, using FASM and GoLink, and cpu2.dll

        ;----------------------------------
        ;Win64 Compile >> fasm this.asm
        ;Win64 Link    >> golink /console this.obj cpu2.dll kernel32.dll
        ;----------------------------------
        format MS64 COFF
        public start
    
        extrn memviewc          ;import this from cpu2.dll
        extrn ExitProcess       ;import this from kernel32.dll
    
        section '.data' data readable writeable
        msg: db 'Hello CPA',0ah,0
    
        section '.text' code readable executable
        start:
    
                sub rsp,40		;Shadow space and alignment
    
                mov rdx,-2000h     ;view -8192 negative offsets. Yours may vary
                mov rcx,msg        ;View this starting area but further back
                call memviewc      ;See your "MZ" header and stub
    
                xor ecx,ecx
                call ExitProcess

    Output (long output)

        ...
        000000000040007F| _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |-1F90|-8080
        000000000040006F| _ _ _ _ \ _ _ f _ _ _ d _ _ E P |-1FA0|-8096
        000000000040005F| _ _ _ _ _ _ _ _ m o c . l o o T |-1FB0|-8112
        000000000040004F| v e D o G . w w w _ k n i L o G |-1FC0|-8128
        000000000040003F| _ _ _ ` ! _ L _ ! _ _ _ _ _ _ $ |-1FD0|-8144
        000000000040002F| _ _ ! m a r g o r P _ 4 6 n i W |-1FE0|-8160
        000000000040001F| _ _ _ _ _ _ _ @ _ _ _ _ _ _ _ _ |-1FF0|-8176
        000000000040000F| _ _ _ _ _ _ _ _ _ _ _ _ _ l Z M |-2000|-8192

    If you need to reveal more technical information about your executable or DLL header, you can use memviewn or memview routines instead, in combination with dumpreg where the revealed information is in numbers instead of textual.

        ;----------------------------------
        ;Win64 Compile >> fasm this.asm
        ;Win64 Link    >> golink /console this.obj cpu2.dll kernel32.dll
        ;----------------------------------
        format MS64 COFF
        public start
    
        extrn memview           ;import these from cpu2.dll
        extrn prnline
        extrn dumpreg
        extrn ExitProcess       ;import this from kernel32.dll
    
        section '.data' data readable writeable
        msg: db 'Hello CPA',0ah,0
    
        section '.text' code readable executable
        start:
    
                sub rsp,40        ;Shadow space and alignment
    
                mov rax,msg
                call dumpreg      ;Verify position of .data section
    
                call prnline      ;Separate the two output
    
                mov rdx,200       ;view 200 bytes
                mov rcx,400000h   ;View from BASE address. Around RAX
                call memview      ;'MZ' magic number, COFF stub and others
    
                xor ecx,ecx
                call ExitProces

    Output:

        RAX|0000000000402000 RCX|000000000026F000 RDX|0000000000401000
        RBX|0000000000000000 R8 |000000000026F000 R9 |0000000000401000
        RDI|0000000000000000 RSI|0000000000000000 R10|0000000000000000
        R11|0000000000000000 R12|0000000000000000 R13|0000000000000000
        R14|0000000000000000 R15|0000000000000000 RBP|0000000000000000
        RSP|000000000014FF30 RIP|000000000040100E [UHEX]
    
        0000000000400000| 4D 5A 6C 00 01 00 00 00 |+7|+7	;‘MZ’ header
        0000000000400008| 02 00 00 00 FF FF 00 00 |+F|+15
        0000000000400010| 00 00 00 00 11 00 00 00 |+17|+23
        0000000000400018| 40 00 00 00 00 00 00 00 |+1F|+31
        0000000000400020| 57 69 6E 36 34 20 50 72 |+27|+39
        0000000000400028| 6F 67 72 61 6D 21 0D 0A |+2F|+47
        0000000000400030| 24 B4 09 BA 00 01 CD 21 |+37|+55
        0000000000400038| B4 4C CD 21 60 00 00 00 |+3F|+63
        0000000000400040| 47 6F 4C 69 6E 6B 20 77 |+47|+71
        0000000000400048| 77 77 2E 47 6F 44 65 76 |+4F|+79
        0000000000400050| 54 6F 6F 6C 2E 63 6F 6D |+57|+87
        0000000000400058| 00 00 00 00 00 00 00 00 |+5F|+95
        0000000000400060| 50 45 00 00 64 86 03 00 |+67|+103  ;’PE’ magic number
        0000000000400068| 7F CE 19 5C 00 00 00 00 |+6F|+111
        0000000000400070| 00 00 00 00 F0 00 03 00 |+77|+119
        0000000000400078| 0B 02 01 00 00 02 00 00 |+7F|+127
        0000000000400080| 00 04 00 00 00 00 00 00 |+87|+135
        0000000000400088| 00 10 00 00 00 10 00 00 |+8F|+143
        0000000000400090| 00 00 40 00 00 00 00 00 |+97|+151
        0000000000400098| 00 10 00 00 00 02 00 00 |+9F|+159
        00000000004000A0| 05 00 02 00 00 00 00 00 |+A7|+167
        00000000004000A8| 05 00 02 00 00 00 00 00 |+AF|+175
        00000000004000B0| 00 40 00 00 00 02 00 00 |+B7|+183
        00000000004000B8| 6D DD 00 00 03 00 00 00 |+BF|+191
        00000000004000C0| 00 00 10 00 00 00 00 00 |+C7|+199

    Another example is in dealing with instruction encoding. Using CPU2.0's opcode and opsize routines, one can see the encoding produced by particular assemblers. Example below shows what's the encoding for instruction add rax,1 produced by NASM with the default optimization option. On Win64, using NASM and GCC64;

    ;------------------------
    ; nasm -f win64 this.asm
    ; gcc -m64 this.obj cpu2.obj -s -o this.exe
    ;------------------------
            global main
    
            ;import this from cpu2.dll/cpu2.obj
            extern opcode
    
    a:
            add eax,1
    b:
    
            section .text
    main:
            sub     rsp,40
    
            mov     rdx,b
            mov     rcx,a
            call    opcode
    
            add     rsp,40
            ret

    Output

    01 C0 83

    From the output one can see that NASM produces 3-byte opcodes for such instruction. Now what if you compiled with optimization off? Lets see the encoding produced by NASM with optimization off (-O0) switch for the same instruction

    ;------------------------
    ; nasm -f win64 this.asm -O0
    ; gcc -m64 this.obj cpu2.obj -s -o this.exe
    ;------------------------
            global main
    
            ;import this from cpu2.dll/cpu2.obj
            extern opcode
    
    a:
            add eax,1
    b:
    
            section .text
    main:
            sub     rsp,40
    
            mov     rdx,b
            mov     rcx,a
            call    opcode
    
            add     rsp,40
            ret

    Output

    00 00 00 01 05

    Now you see NASM produces 5-byte instructions instead of 3. This is one way how one can see, on-the-fly, how an optimizing assembler such as NASM implements multi-pass assembling as one of its core features.


    Examples: C/C++ access


    Another strong point of CPU2.0 is accessibility from high-level languages such as C and C++. This is due to ABI-compliancy of CPU2.0. When accessed this way, CPU2.0 routines will reveal vital information of C/C++ internal structures and the CPU state at points of insertions. One will just have to supply the correct argument types to the designated parameters and capture the correct return values. Note that not all CPU2.0 are suitable for high-level access due to their nature.

    Example below demonstrates the power of CPU2.0 when accessed via C on Win64

    // Example: Accessing CPU2.0 Library from C
    // Binaries (cpu2.o,cpu2.obj) should reside in working folder
    // or in searchable PATH of your own choosing
    
    //Compile>> gcc -m64 this.c cpu2.obj -s -o this.exe (For Win64)
    //Compile>> gcc -m64 this.c cpu2.o -s -o this (For Linux64)
    
    #include <stdio.h>
    #include <math.h>
    
    //Import these from CPU2.0
    extern void dumpreg();
    extern void dumpxmmf(unsigned long long);
    extern void stackview(unsigned long long);
    extern void fpdinfo(double);
    extern void fpfinfo(float);
    
    int main()
    {
    
        float x=56.456;
        double y=0.454,w=1.234;
        double z=0.0;
    
        //Display 5 items of stack of main();
        stackview(5);
        putchar('\n');
    
        //Display register state of main();
        dumpreg();
        putchar('\n');
    
        //Display IEEE-754 Floating point Double format
        fpdinfo(y);
        putchar('\n');
    
        //Display IEEE-754 Floating point Single format
        fpfinfo(x);
        putchar('\n');
    
        //See what is returned by C's POW(), via XMM registers
        z=pow(w,y);
        dumpxmmf(9);  //View as (formatted) packed doubles
    
        return 0;
    }

    Producing this power output

    0000000000000001 |000000000061FE20
    00000000004017B5 |000000000061FE18
    FFFFFFFFFFFFFFFF |000000000061FE10
    0000000000000001 |000000000061FE08
    00000000001B1330 |000000000061FE00
    
    RAX|000000000000000A RCX|00000000FFFFFFFF RDX|0000000000000001
    RBX|0000000000000001 R8 |00007FFBD6985920 R9 |000000000061E2A0
    RDI|00000000001B1330 RSI|0000000000000006 R10|0000000000000000
    R11|0000000000000246 R12|0000000000000001 R13|0000000000000008
    R14|0000000000000000 R15|0000000000000000 RBP|000000000061FE50
    RSP|000000000061FE00 RIP|0000000000401601 [UHEX]
    
    0.454 = 3FDD0E5604189375
    0.01111111101.1101000011100101011000000100000110001001001101110101
    S.EXPONT   +1.MANT
    SIGN: 0
    EXP : -2 (1021 - 1023)
    MANT: 1.815999999999999
    
    56.456 = 4261D2F2
    0.10000100.11000011101001011110010
    S.EXPONT 1.MANT
    SIGN: 0
    EXP : +5 (+132 - 127)
    MANT: 1.76425
    
     xmm0:                    +0.0|     +1.100163120495035|
     xmm1:                    +0.0|                 +0.454|
     xmm2:                    +0.0|                   +0.0|
     xmm3:                    +0.0|                   +0.0|
     xmm4:                    +0.0|                   +0.0|
     xmm5:                    +0.0|                   +0.0|
     xmm6:                    +0.0|                   +0.0|
     xmm7:                    +0.0|                   +0.0|
     xmm8:                    +0.0|                   +0.0|
     xmm9:                    +0.0|                   +0.0|
    xmm10:                    +0.0|                   +0.0|
    xmm11:                    +0.0|                   +0.0|
    xmm12:                    +0.0|                   +0.0|
    xmm13:                    +0.0|                   +0.0|
    xmm14:                    +0.0|                   +0.0|
    xmm15:                    +0.0|                   +0.0|

    Similarly, the same effect can be achiveved via C++, using similar code setting as shown below

    // Example: Accessing CPU2.0 Library from C++
    // Binaries (cpu2.o,cpu2.obj) should reside in working folder
    // or in searchable PATH of your own choosing
    
    //Compile>> g++ -m64 this.cpp cpu2.obj -s -o this.exe (For Win64)
    //Compile>> g++ -m64 this.cpp cpu2.o -s -o this (For Linux64)
    
    #include <iostream>
    #include <cmath>
    
    //Import these from CPU2.0
    extern "C"
    {
        void dumpreg();
        void dumpxmmf(unsigned long long);
        void stackview(unsigned long long);
        void fpdinfo(double);
        void fpfinfo(float);
    }
    
    int main()
    {
    
        float x=56.456;
        double y=0.454,w=1.234;
        double z=0.0;
    
        //Display 5 items of stack of main();
        stackview(5);
        putchar('\n');
    
        //Display register state of main();
        dumpreg();
        putchar('\n');
    
        //Display IEEE-754 Floating point Double format
        fpdinfo(y);
        putchar('\n');
    
        //Display IEEE-754 Floating point Single format
        fpfinfo(x);
        putchar('\n');
    
        //See what is returned by C's POW(), via XMM registers
        z=pow(w,y);
        dumpxmmf(9);  //View as (formatted) packed doubles
    
        return 0;
    }

    This will become handy if you want to know what's happening under the hood of your C/C++ programs, particularly when you deal with lots of intrinsics and floating-point data. CPU2.0 is providing you with low-level view of your high-level programs.


    Examples: Others


    CPU2.0 offers lots of other supporting routines in addition to what have been featured and revealed here. Do explore and build up your familiarity around them. They will help you a lot through your stages of learning and eventually building applications.


    Q&A


    Q: Is CPU2.0 a debugger?
    A: In many cases, yes. But CPU2.0 has never been designed to take the role of a debugger. CPU2.0 is conditioned to help the assembly language programmers interface the CPU in a protected and controlled execution environment. Therefore in so many ways, CPU2.0 share many similarities with a debugger.

    Q: Is CPU2.0 standalone
    A: It is. CPU2.0’s only dependency is on the OS kernel services without using any other third party libraries. That simply means CPU2.0 is relatively fast with very small memory footprint, lightweight, easy to use and setup, no confusing include files, and simple linking and execution. CPU2.0 essentially offers a very thin layer betweed applications and the machine internals.

    Q: Is CPU2.0 an industrial product
    A: No. CPU2.0 is designed for the purposes of teaching and learning, offering only lightweight routines suitable for talk-and-chalk methods or to quickly build up proofs and test-case codes. Should not be used around heavyweight tasks and those that require industrial precision. However with enough expertise and familiarity around CPU2.0, you can accomplish big tasks with it

    Q: Is CPU2.0 a GUI library
    A: No. CPU2.0 is a CPU-interfacing tool with strong emphasis on machine instructions. It is not a GUI application and should be run in command-line environment only. With the right setting and expert use of CPU2.0, you can however probe your GUI programs with low-level CPU2.0 routines.

    Copyrights & Licence


    (c)2018-2019. Soffian Abdul Rasad & Sarah Safarina Rahmat. All rights reserved.

    This program is free for commercial and non-commercial use as long as the following conditions are adhered to.

    Copyright remains Soffian Abdul Rasad & Sarah Safarina Rahmat, and as such any Copyright notices in the code are not to be removed.

    Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

    1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

    The licence and distribution terms for any publically available version or derivative of this code cannot be changed. i.e. this code cannot simply be copied and put under another distribution licence (including the GNU Public Licence).


    Disclaimer


    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

    All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.


    Contacts


    If you have any suggestions, bug reports or any problems when using CPU2.0, do write to us via soffianabdulrasad @ gmail . com (no spaces).