Rethinking The C Time API

Out of all the components of C, its time API is probably the one most plagued with legacy cruft. To the point almost every regularly used element of it has some design decision that’s been obsolete for decades.

As a example, here is some code I use to print the current time for my status bar:

#include <stdio.h>
#include <time.h>
#include <unistd.h>
int main(void)
{
  char buf[40];
  time_t now = time(0);
  while (1) {
    strftime(buf, 40, "%a %b %d %T", localtime(&now));
    puts(buf);
    fflush(stdout);
    sleep(1);
    now = time(0);
  }
}

This is possibly the simplest real-world use of the C time API. And even then the legacy cruft and bad design makes this code significantly less organic.

For comparison, here is the corresponding Lua code

while true do
	print(os.date("%a %b %d %T"))
	io.stdout:flush();
	if not os.execute("sleep 1") then return 1; end
end

The library I describe in this article was not made because I expect it to have widespread use. But as a proof of concept of what could’ve been, and to illustrate some of the subtler design flaws of the time library.

Scope

I will be using the functions described in Eric S. Raymond’s Time, Clock, and Calendar Programming In C as a boundary for the C time API. These forty-something functions can be classified as:

The only function that doesn’t fit here is difftime(), which is just a subtraction.

Out of these, clock handling, system level APIs for setting the time, Alarm/Timer handling (which has as much to do with signals as it does time), And NTP correction are out of scope. This leaves:

The main types of the C time API (that matter to us) are:

time_t and struct timespec are used almost exclusively in kernel-level functions. Whilst struct tm is used almost exclusively with conversion of time to and from strings.

Nanoseconds, Floating Point Percision, and the Y2262 problem

It would be awfully convenient to represent time in nanosecond form everywhere all of the time. It’d give strftime and strptime the ability to print milli/micro/nanoseconds. And it’d remove the need for the timespec struct used in a lot of system-level time functions.

A floating point number is able to store values up to 2mantissa_length with integral precision. Actually, calculating floating point precision loss is surprisingly easy. For a number n; Any number below 2n will have at least 2n-mantissa_length precision.

As an example, Lets consider a long double that represents seconds. We lose nanosecond level precision when 2n-63 is 10^-9^. Which means we lose precision at around ~233 seconds.

$ date -d "@$((2**33))"
Wed Mar 16 07:56:32 AM CDT 2242

If this long double were to represent nanoseconds, we’d lose precision at 263 nanoseconds (Around 2262 (The 20 year difference is due to the fact that 10^-9^ is actually around 2^-29.8^)).

Using some go code I was able to generate the following table:

Type/Resolution float (23) int (31) double (52) long/x87 long double (63)
1 ns 1970-01-01T00:00 1970-01-01T00:00 1970-02-22T02:59 2262-04-11T23:47
-1 ns 1969-12-31T23:59 1969-12-31T23:59 1969-11-09T21:00 1677-09-21T00:12
10 ns 1970-01-01T00:00 1970-01-01T00:00 1971-06-06T05:59 4892-10-07T21:52
-10 ns 1969-12-31T23:59 1969-12-31T23:59 1968-07-28T18:00 -0953-03-26T02:07
1 us 1970-01-01T00:00 1970-01-01T00:35 2112-09-17T23:53 294247-01-10T04:00
-1 us 1969-12-31T23:59 1969-12-31T23:24 1827-04-16T00:06 -290308-12-21T19:59
1 ms 1970-01-01T02:19 1970-01-25T20:31 144683-05-23T16:29 292278994-08-17T07:12
-1 ms 1969-12-31T21:40 1969-12-07T03:28 -140744-08-10T07:30 -292275055-05-16T16:47
1 s 1970-04-08T02:10 2038-01-19T03:14 142715360-12-06T03:48 292277026596-12-04T15:30
-1 s 1969-09-25T21:49 1901-12-13T20:45 -142711421-01-25T20:11 292277026596-12-04T15:30

Looking at this chart alone, 64 bit integers don’t seem much worse than long doubles, but keep in mind that Integers support One percision, and there’s a trade off between resolution and the bounds of your epoch, Floating point values support all percisions, there is no such trade off.

For this reason, date_t is a long double floating point value of seconds since the epoch.

“Broken Down Time”

Now that we have a base time type, there needs to be some way to convert between human friendly to machine friendly values. I.e. getting the year, month and day. In the spirit of “100 functions for 10 datastructures vs. 10 functions for 1 datastructure”, Unless a functions job is to handle human-friendly time values, it will use date_t.

The way this is done in C is with struct tm , which has many problems.

Creating our own calendar structure to fix these problems:

struct cal {
        uint32_t nsec; // 0..1E9
        uint8_t   sec; // 0..60
        uint8_t   min; // 0..59
        uint8_t  hour; // 0..23
        uint8_t   day; // 0..30
        uint8_t month; // 0..11
        date_t   year; // Since Epoch
};

With 4 functions to handle them:

extern struct cal tocal(date_t d);
extern int wdayof(date_t d);
extern int ydayof(date_t d);
extern date_t   fromcal(struct cal cal);

This fixes several problems with the existing struct tm:

“Why no timezones in the struct?”
The date passed into tocal() is ideally already adjusted to a certain timezone with the api later described in this article. The timezone api deals with date_t, not calendars on matter of principle and practicality.

The tragedy of tzset()

The timezone handling code in libc isn’t outdated, that would imply it was once sufficient for timezone handling. tzset() and localtime() are the only ways to handle timezones in libc, and both of them have a insane relationship with each other and the process environment:

tzset, environ, and localtime relationship

A timezone in use is essentially a number of seconds to adjust with, and a name which can be printed (note that this is different from the name you would use to load the timezone. I.e. America/New_York vs. EST). Neither of these things are constant, with daylight savings time and other various adjustment, it does not make sense to give a constant number of seconds or a name for a timezone. (Even with DST variants)

localtime() respects this. And gives one zone name and one offset that are dependent on time in a tm struct (the struct variables that store this are non-portable, but it’s the only way to properly handle timezones without parsing tzdb files).

Thus, get a proper timezone offset and name, we have to:

Creating three base functions and two convenience functions to work with:

int      tzoffat(date_t d, char *tz); // Seconds east of UTC
char   *tznameat(date_t d, char *tz);
const char *mytz(void);

date_t      intz(date_t d, char *tz); // d+tzoffat(d, tz)
date_t    inmytz(date_t d);           // intz(d, mytz())

Why weekdays in the time structure are bad

strptime() is special because it uses uncertainty as a tool. It wont touch anything in the calendar structure that isn’t directly correlated with a formatting specification. This is as much of a asset as it is a liability, it’s useful because you can read time with a set of presumptions (i.e. read mm/dd as the current year and not 1970). But it’s a liability because you can unintentionally read time with a set of presumptions (i.e. read mm/dd and then print a wrong weekday because strptime() did not correct the weekday).

Date String Makes Sense? strptime result?
Thursday February 16 1978 Yes Thur. 2/16/78
Febuary 16 1978 Yes ?? 2/16/78
Febuary 1978 Yes ?? 2/??/78)
Thursday February 1978 No Thur. 2/??/78
Monday February 30 No Mon. 2/30/??

This is not a problem if the weekday is inferred from other information

Formatted Time

strftime() formatting has worked its way into many programming languages and applications, this is because it is the easiest, sometimes the only way, of getting time to string conversion to work in many environments. It has also been standardized since C89. Making the formatting of an analog of it a superset of the functionality would provide the benefit of compatibility. Since strftime formatting strings are often passed in from user input. However, the mnemonics for strftime are poor and do not allow for easy extension.

We can free up space for more formatters by using multiple letters in the variations of other formatters.

In defense of libc…

The time library in C is largely terrible because it is made entirely out of non-tessellating ideas and hacks, and has constantly resisted improvement by standardization. Many components in the library were developed decades apart. Many more were designed without the idea of internationalization in mind. The result of this is something that is “complete”, but not pleasant or elegant to use, and which leaves many pitfalls for bugs.

And as other languages try to improve their own time libraries, looking at the mistakes of C, It is interesting to think of ways C itself could’ve improved its own time library looking at these same mistakes.

The GitHub project for this time library (Partial WIP): https://github.com/oliverkwebb/newtime/