fanf: (Default)
[personal profile] fanf

https://dotat.at/@/2024-08-02-turing-c.html

Yesterday there was some discussion on the Orange Site about whether or not C is Turing complete. The consensus in the StackOverflow question is,

  • no, because the C abstract machine is a (large) finite state machine,

  • or maybe yes, if you believe that unaddressable local variables can exist outside the finite address space.

My answer is definitely yes, if you include the standard IO library.

And using IO is much closer to Turing's original model of a finite state machine working on unbounded storage.

C is a finite state machine

The C abstract machine limits the size of the state it can work on by limiting the size of pointers and the size of the objects that can be pointed to. There are some unaddressable objects but they are usually understood to be a small finite number of machine registers.

So the number of states in the C abstract machine is,

    ptr_bits = CHAR_BIT * sizeof(char *);
    memory_bytes = 1 << ptr_bits;
    total_bytes = memory_bytes + register_bytes;
    total_bits = CHAR_BIT * total_bytes;
    number_of_states = 1 << total_bits;

Typically about 2^(2^(2^6)) states. Which is a lot, but still finite.

is IO really unbounded?

So, for C to be Turing complete, it must support unbounded IO.

Traditionally, C stdio supports two kinds of stream:

  • unbounded streams, such as terminals;
  • seekable streams, such as files.

What we need is an unbounded seekable stream. It has to be seekable because we need to be able to move in both directions, and the only way to move backwards is to seek.

But aren't seeks bounded? Well, no, it turns out. The declaration of fseek is,

    int fseek(FILE *stream, long offset, int whence);

What is notable is that in many real-world implementations, file sizes have been measured with 64 bit numbers, but the fseek long offset is been only 32 bits.

Therefore the size of a seek offset does not limit the size of a stdio file.

We can read and write within some finite distance relative to the file position indicator, and (so long as we treat its value as unknowable and incomparable) after a series of reads and writes and seeks the file position indicator can become arbitrarily large.

conclusion

So, in theory (after all, this is a theoretical model of computation) we can have unbounded seekable IO streams in C, so we can implement a Turing machine in C.

That's the point of this post, so you can stop reading. But I worked out the details, so read on if you like code. (The trick is in the read and write part.)

the symbols

A Turing machine has an alphabet of symbols or marks on the tape.

All of the tape is blank, except for a known region around the head of the machine. The known region covers the initial input (which is finite) and any part of the tape that has been visited by the head.

In a state transition, the machine can write a new symbol, erase the symbol, or leave it unchanged. To make things more uniform, we always write a new symbol: we write the blank symbol to erase, and write the same symbol to leave it unchanged.

As an implementation detail, we add a special symbol to represent the infinite_blanks to the left and right of the known region. The state transitions for infinite_blanks are the same as normal blanks. The Turing machine cannot write infinite_blanks; it must write one normal blank at a time.

    enum mark {
        infinite_blanks,
        blank,
        // ... other marks ...
        mark_count
    };

the state transitions

A Turing machine has a set of states, including a particular initial start state, and a final halting stop state.

In mathematical descriptions, the machine's state transition function is described using a collection of 5-tuples:

  • the current state
  • the symbol under the head
  • the symbol to replace it with
  • which direction to move or not
  • the next state

The write happens before the move.

We represent it as a 2D array indexed by the current state and mark, containing instructions about what to do. There are no transitions for the stop state.

    enum state {
        start,
        // ... other states ...
        stop,
        state_count = stop
    };

    static struct transition {
        enum { move_left, move_not, move_right } move;
        enum mark mark;
        enum state state;
    } machine[state_count][mark_count] = {
        // ...
        // define your Turing machine here
        // ...
    };

the tape

We represent the tape using two files, one containing all the tape to the left of the head, and one containing all the tape to the right of the head. The cell of the tape under the head is represented separately, in memory.

The files contain a sequence of cells represented in binary as enum mark objects accessed using fread() and fwrite(). Each file is divided into three parts:

  • The first cell contains infinite_blanks. (No other cells contain infinite_blanks.) It is a sentinel that allows us to avoid trying to seek before the start of the file.

  • The cells between the first cell and the file position indicator are (the left or right parts of) the known region of the tape. The file position indicator is manipulated using fseek().

  • There can be garbage cells after the file position indicator. C does not give us a way to truncate the file to erase unused cells, so we just ignore them.

The files are in opposite orders: the left file reads left to right, and the right file reads right to left, from the beginning of each file to its file position indicator.

the machine

To start, we need to be given the initial contents of the tape.

There are three command line arguments: the left file, the cell under the head, and the right file.

After opening the files we need to move the file position indicator to the end of each file.

    int main(int argc, char *argv[])
    {
        FILE *left = fopen(argv[1], "wb+");
        fseek(left, 0, SEEK_END);

        enum mark mark = atoi(argv[2]);

        FILE *right = fopen(argv[3], "wb+");
        fseek(right, 0, SEEK_END);

The operation of the machine itself is sraightforward.

The read_write() function pushes a new cell onto the second file, representing the overwritten contents of the cell that used to be under the head; and it pops a cell from the first file that becomes the cell under the head, and returns its contents.

        enum state state = start;
        while(state != stop) {
            struct transition transition = machine[state][mark];
            if(transition.move == move_not)
                mark = transition.mark;
            if(transition.move == move_left)
                mark = read_write(left, right, transition.mark);
            if(transition.move == move_right)
                mark = read_write(right, left, transition.mark);
            state = transition.state;
        }

When the machine stops, we need to remove the garbage from the left and right files, and print the contents of the cell under the head, so that the caller can see the final contents of the tape.

        cleanup(left, argv[1]);
        cleanup(right, argv[3]);
        printf("%d\n", mark);
        return(0);
    }

read and write

This is the tricky part, where we make the files work as unbounded one-ended tapes.

Pushing a cell onto a file is easy: it's just fwrite(), which extends the file as necessary (or overwrites garbage). This is where unbounded storage is created.

Popping a cell is more faff, because we have to read backwards. We have to move the file position indicator before the cell, read the cell -- which moves the file position indicator after the cell, so we have to move the file position indicator before it again.

When we reach the start of the file, we skip the second seek, so that the infinite_blanks always remain outside the known region of the tape, before the file position indicator.

    #define MARKSIZE (long)sizeof(enum mark)

    static enum mark
    read_write(FILE *read, FILE *write, enum mark mark)
    {
        fwrite(&mark, sizeof(mark), 1, write);
        fseek(read, -MARKSIZE, SEEK_CUR);
        fread(&mark, sizeof(mark), 1, read);
        if(mark != infinite_blanks)
            fseek(read, -MARKSIZE, SEEK_CUR);
        return(mark);
    }

cleanup

I mentioned before that C does not give us a way to truncate a file to a particular size. In fact we need a way to truncate an unbounded file to an arbitrarily large size, so POSIX truncate() is no good either because off_t is finite and our file position indicator is unknowable.

To clean up the garbage, we copy the known region of the tape to a replacement file, in an incremental streaming manner. We read backwards towards the start of the file and write forwards extending the replacement file, which reverses the contents of the tape. So we have to copy each file twice.

    static void copy_and_reverse(FILE *read, FILE *write) {
        enum mark mark = infinite_blanks;
        do mark = read_write(read, write, mark);
        while(mark != infinite_blanks);
    }

    static void cleanup(FILE *fp, const char *name) {
        FILE *tmp = tmpfile();
        copy_and_reverse(fp, tmp);
        fclose(fp);
        fp = fopen(name, "wb"); // make it empty
        copy_and_reverse(tmp, fp);
        fclose(fp);
        fclose(tmp);
    }

done!

I have omitted any error checking from this code, since it's a theoretical model, and in theory it cannot possibly go wrong.

This account has disabled anonymous posting.
(will be screened if not on Access List)
(will be screened if not on Access List)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org

June 2025

S M T W T F S
1234567
8 91011121314
15161718192021
22232425262728
2930     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated 2025-06-18 15:00
Powered by Dreamwidth Studios