Out of all the components of C, its time API is probably the one most plagued with legacy cruft. To the point almost every regularly used element of it has some design decision that’s been obsolete for decades.
As a example, here is some code I use to print the current time for my status bar:
#include <stdio.h>
#include <time.h>
#include <unistd.h>
int main(void)
{
char buf[40];
time_t now = time(0);
while (1) {
strftime(buf, 40, "%a %b %d %T", localtime(&now));
puts(buf);
fflush(stdout);
sleep(1);
now = time(0);
}
}
time()
unnecessarily takes a pointer argument to write tostrftime()
has to write to a string of a fixed length it can not dynamically allocate (This is less legacy than it is bad design)localtime()
needs the pointer to a time_t value even though it does not change it because of register size concerns on PDP-11’ssleep()
cannot sleep for sub-second amounts of time,usleep()
is deprecated and it’s alternativenanosleep()
requires you to define variables
This is possibly the simplest real-world use of the C time API. And even then the legacy cruft and bad design makes this code significantly less organic.
For comparison, here is the corresponding Lua code
while true do
print(os.date("%a %b %d %T"))
io.stdout:flush();
if not os.execute("sleep 1") then return 1; end
end
The library I describe in this article was not made because I expect it to have widespread use. But as a proof of concept of what could’ve been, and to illustrate some of the subtler design flaws of the time library.
Scope
I will be using the functions described in Eric S. Raymond’s Time, Clock, and Calendar Programming In C as a boundary for the C time API. These forty-something functions can be classified as:
- Alarm/Timer (
alarm()
,ualarm()
, thetimer_
group) - Getting the current time (
time()
,clock_gettime()
,getttimeofday()
,ftime()
,timespec_get()
) - Setting the current time (
settimeofday()
,clock_settime()
) - NTP correction (
adjtime()
/adjtimex()
) - Converting system time to calendar format (
localtime()
,gmtime()
, and their variants) - Converting calendar time to system format (
mktime()
,timegm()
,timelocal()
, and their variants) - Sleeping (
sleep()
,usleep()
,nanosleep()
) - Converting a time to a string (
asctime()
,ctime()
, and their variants, as well asstrftime()
) - Converting a string to a time (
getdate()
andstrptime()
) - Timezone handling (
tzset()
, and the Berkeley timezone API) - Clock handling (
clock()
,clock_getres()
)
The only function that doesn’t fit here is difftime()
, which is just a subtraction.
Out of these, clock handling, system level APIs for setting the time, Alarm/Timer handling (which has as much to do with signals as it does time), And NTP correction are out of scope. This leaves:
- Getting the current time
- Converting system time to calendar format
- Converting calendar time to system format
- Converting a time to a string
- Converting a string to a time
- Timezone handling
- Sleeping
The main types of the C time API (that matter to us) are:
time_t
, which in practice is a 64 bit signed integer of seconds since 1/1/1970 00:00:00 UTC.struct tm
, a broken down (both literally and figuratively) representation of calendar time.struct timespec
, a representation of fractional time in seconds and nanoseconds.timezone_t
, a BSD exclusive opaque timezone type.
time_t
and struct timespec
are used almost exclusively in kernel-level functions. Whilst struct tm
is used
almost exclusively with conversion of time to and from strings.
Nanoseconds, Floating Point Percision, and the Y2262 problem
It would be awfully convenient to represent time in nanosecond form everywhere all of the time.
It’d give strftime
and strptime
the ability to print milli/micro/nanoseconds. And it’d remove the need for
the timespec
struct used in a lot of system-level time functions.
A floating point number is able to store values up to 2mantissa_length with integral precision. Actually, calculating floating point precision loss is surprisingly easy. For a number n; Any number below 2n will have at least 2n-mantissa_length precision.
As an example, Lets consider a long double that represents seconds. We lose nanosecond level precision when 2n-63 is 10^-9^. Which means we lose precision at around ~233 seconds.
$ date -d "@$((2**33))"
Wed Mar 16 07:56:32 AM CDT 2242
If this long double were to represent nanoseconds, we’d lose precision at 263 nanoseconds (Around 2262 (The 20 year difference is due to the fact that 10^-9^ is actually around 2^-29.8^)).
Using some go code I was able to generate the following table:
Type/Resolution | float (23) | int (31) | double (52) | long/x87 long double (63) |
1 ns | 1970-01-01T00:00 | 1970-01-01T00:00 | 1970-02-22T02:59 | 2262-04-11T23:47 |
-1 ns | 1969-12-31T23:59 | 1969-12-31T23:59 | 1969-11-09T21:00 | 1677-09-21T00:12 |
10 ns | 1970-01-01T00:00 | 1970-01-01T00:00 | 1971-06-06T05:59 | 4892-10-07T21:52 |
-10 ns | 1969-12-31T23:59 | 1969-12-31T23:59 | 1968-07-28T18:00 | -0953-03-26T02:07 |
1 us | 1970-01-01T00:00 | 1970-01-01T00:35 | 2112-09-17T23:53 | 294247-01-10T04:00 |
-1 us | 1969-12-31T23:59 | 1969-12-31T23:24 | 1827-04-16T00:06 | -290308-12-21T19:59 |
1 ms | 1970-01-01T02:19 | 1970-01-25T20:31 | 144683-05-23T16:29 | 292278994-08-17T07:12 |
-1 ms | 1969-12-31T21:40 | 1969-12-07T03:28 | -140744-08-10T07:30 | -292275055-05-16T16:47 |
1 s | 1970-04-08T02:10 | 2038-01-19T03:14 | 142715360-12-06T03:48 | 292277026596-12-04T15:30 |
-1 s | 1969-09-25T21:49 | 1901-12-13T20:45 | -142711421-01-25T20:11 | 292277026596-12-04T15:30 |
Looking at this chart alone, 64 bit integers don’t seem much worse than long doubles, but keep in mind that Integers support One percision, and there’s a trade off between resolution and the bounds of your epoch, Floating point values support all percisions, there is no such trade off.
For this reason, date_t
is a long double floating point value of seconds since the epoch.
“Broken Down Time”
Now that we have a base time type, there needs to be some way to convert between
human friendly to machine friendly values. I.e. getting the year, month and day.
In the spirit of “100 functions for 10 datastructures vs. 10 functions for 1 datastructure”,
Unless a functions job is to handle human-friendly time values, it will use date_t
.
The way this is done in C is with struct tm
, which has many problems.
- almost always handled in statically allocated pointers that get overwritten (
gmtime()
) - No way to represent sub-second time.
- tm_mday starts at one instead of zero (as the rest of the struct values do) for no reason.
- tm_wday and tm_yday make it harder to construct completely valid structs
- mktime(), being the main way to convert back into
time_t
, changes the struct that is passed in.
Creating our own calendar structure to fix these problems:
struct cal {
uint32_t nsec; // 0..1E9
uint8_t sec; // 0..60
uint8_t min; // 0..59
uint8_t hour; // 0..23
uint8_t day; // 0..30
uint8_t month; // 0..11
date_t year; // Since Epoch
};
With 4 functions to handle them:
extern struct cal tocal(date_t d);
extern int wdayof(date_t d);
extern int ydayof(date_t d);
extern date_t fromcal(struct cal cal);
This fixes several problems with the existing struct tm
:
- Fractional Time
- Day of month starts with 0 instead of 1
- Years over INT_MAX possible
- Smaller than the
struct tm
- Any value where the fields are within range corresponds to a unique valid time.
- “Why no timezones in the struct?”
- The date passed into
tocal()
is ideally already adjusted to a certain timezone with the api later described in this article. The timezone api deals withdate_t
, not calendars on matter of principle and practicality.
The tragedy of tzset()
The timezone handling code in libc isn’t outdated, that would imply it was once sufficient for timezone handling.
tzset()
and localtime()
are the only ways to handle timezones in libc, and both of them have a insane relationship
with each other and the process environment:
A timezone in use is essentially a number of seconds to adjust with, and a name which can be printed (note that this is different from the
name you would use to load the timezone. I.e. America/New_York
vs. EST
). Neither of these things are constant, with daylight savings
time and other various adjustment, it does not make sense to give a constant number of seconds or a name for a timezone. (Even with DST
variants)
localtime()
respects this. And gives one zone name and one offset that are dependent on time in a
tm
struct (the struct variables that store this are non-portable, but it’s the only way to properly
handle timezones without parsing tzdb files).
Thus, get a proper timezone offset and name, we have to:
- Set
TZ
to the timezone name - Call
tzset()
(which secretly provides good data tolocaltime
) - Give the
time_t
form of the time tolocaltime_r
(so the global variable localtime keeps doesn’t get overwritten) - Get
tm_gmtoff
andtm_zone
(On musl, tm_zone is overwritten whenever a new timezone is loaded, which means the string has to be duplicated and therefore it must be the users job to free it) - Set
TZ
back to whatever it was
Creating three base functions and two convenience functions to work with:
int tzoffat(date_t d, char *tz); // Seconds east of UTC
char *tznameat(date_t d, char *tz);
const char *mytz(void);
date_t intz(date_t d, char *tz); // d+tzoffat(d, tz)
date_t inmytz(date_t d); // intz(d, mytz())
Why weekdays in the time structure are bad
strptime()
is special because it uses uncertainty as a tool. It wont touch anything in the calendar
structure that isn’t directly correlated with a formatting specification. This is as much of a
asset as it is a liability, it’s useful because you can read time with a set of presumptions
(i.e. read mm/dd as the current year and not 1970). But it’s a liability because you can
unintentionally read time with a set of presumptions (i.e. read mm/dd and then
print a wrong weekday because strptime()
did not correct the weekday).
Date String | Makes Sense? | strptime result? |
Thursday February 16 1978 | Yes | Thur. 2/16/78 |
Febuary 16 1978 | Yes | ?? 2/16/78 |
Febuary 1978 | Yes | ?? 2/??/78) |
Thursday February 1978 | No | Thur. 2/??/78 |
Monday February 30 | No | Mon. 2/30/?? |
This is not a problem if the weekday is inferred from other information
Formatted Time
strftime()
formatting has worked its way into many programming languages and applications,
this is because it is the easiest, sometimes the only way, of getting time to string conversion
to work in many environments. It has also been standardized since C89.
Making the formatting of an analog of it a superset of the functionality would provide the benefit of
compatibility. Since strftime formatting strings are often passed in from user input.
However, the mnemonics for strftime are poor and do not allow for easy extension.
We can free up space for more formatters by using multiple letters in the variations of other formatters.
- s - seconds
- Us - microseconds
- Ns - nanoseconds
- m - minutes
- h - hours
- ch - clock hours (12-hour time)
- ih - indicator for hours (AM/PM)
- d - Month day
- w - Full weekday
- aw - Abbreviated weekday
- nw - Number of weekday
- M - month name
- aM - Abbreviated month
- nM - Number of month
- y - year
- Dy - day of year
- Cy - Century
- z - zone name
- oz - zone offset
- nz - index name of zone (i.e.
America/New_York
)
In defense of libc…
The time library in C is largely terrible because it is made entirely out of non-tessellating ideas and hacks, and has constantly resisted improvement by standardization. Many components in the library were developed decades apart. Many more were designed without the idea of internationalization in mind. The result of this is something that is “complete”, but not pleasant or elegant to use, and which leaves many pitfalls for bugs.
And as other languages try to improve their own time libraries, looking at the mistakes of C, It is interesting to think of ways C itself could’ve improved its own time library looking at these same mistakes.
The GitHub project for this time library (Partial WIP): https://github.com/oliverkwebb/newtime/