v1.2.0
- Usage
- Unambiguous Datetime
- Ambiguous Datetime
- Working with formats
- Working with
DatetimeConfig - Working with TSDatetime
- DatetimeConfig
- Limitations
- Changelog
datetime_raw_str (str): Raw datetime stringformats (Sequence[str], optional): List of possible datetime formats. These datetime formats must be built using Supported Datetime Tokens. Defaults to empty tuple.- If the
DatetimeConfigobject in theconfigposition hasrequire_unambiguous_formatsset to:True, any of the formats produce a conflicting output, anAmbiguousDatetimeFormatsErrorwill be raised.False, the first valid parsed datetime valid will be returned.
- If the
config (DatetimeConfig, optional): Datetime Configuration. Defaults toDEFAULT_DATETIME_CONFIG.- It provides complementary information on how to mark parsed digits as day, month or year and also provide options to handle abbreviated time zones and fold for parsing ambiguous timestamps during daylight saving transitions. Ideally,
DatetimeConfigshould be constructed from the pipeline configuration with the following options: day_first (Optional[bool], optional): Whether to interpret the first value in an ambiguous 3-integer date (e.g. 01/05/09) as the dayTrueor monthFalse. Defaults toNone.year_first (Optional[bool], optional): Whether to interpret the first value in an ambiguous 3-integer date (e.g. 01/05/09) as the year. When the year has four digits, then whetheryear_firstisTrueorFalse, is decided by regex parsing done byDatetimeInfoclass. If bothyear_firstandday_firstare true, thenyear_firstwill take priority and resulting date format will be as YDM. Defaults toNone.tz_dict (dict, optional): A python dict that maps abbreviated timezone names to their corresponding offset. Defaults to {}.require_unambiguous_formats (bool, optional): Whether require datetime formats to be unambiguous. Defaults toFalse.- If
require_unambiguous_formats- is
Trueand any of the formats produce a conflicting output, anAmbiguousDatetimeFormatsErrorwill be raised. - is
False, the first valid parsed datetime valid will be returned.
- is
- If
- It provides complementary information on how to mark parsed digits as day, month or year and also provide options to handle abbreviated time zones and fold for parsing ambiguous timestamps during daylight saving transitions. Ideally,
- A
TSDatetimeobject.
Ambiguous input while ignoring ambiguity:
>>> from task_script_utils.datetime_parser import parse
>>> ambiguous_formats = [
"MM/DD/YYYY hh:mm:ss A z",
"DD/MM/YYYY hh:mm:ss A z",
]
>>> datetime_raw_str = "01/02/2003 04:05:06 AM UTC"
>>> parsed_datetime = parse(datetime_raw_str, formats=ambiguous_formats)
>>> parsed_datetime.tsformat() # first parsed output returned
'2003-01-02T04:05:06Z'Ambiguous input while requiring ambiguity:
>>> config = DatetimeConfig(require_unambiguous_formats=True)
>>> parsed_datetime = parse(datetime_raw_str, formats=ambiguous_formats, config=config)
## Raises AmbiguousDatetimeFormatsError
Traceback (most recent call last):
...
task_script_utils.datetime_parser.parser_exceptions.AmbiguousDatetimeFormatsError: Ambiguity found between datetime formats: ['MM/DD/YYYY hh:mm:ss A z', 'DD/MM/YYYY hh:mm:ss A z'], the parsed datetimes ['2003-01-02T04:05:06+00:00', '2003-02-01T04:05:06+00:00'], and the input datetime string '01/02/2003 04:05:06 AM UTC'.Forgivingly ambiguous input while requiring ambiguity:
>>> from task_script_utils.datetime_parser import DatetimeConfig
>>> datetime_raw_str = "02/02/2003 04:05:06 AM UTC"
>>> config = DatetimeConfig(require_unambiguous_formats=True)
>>> parsed_datetime = parse(datetime_raw_str, formats=ambiguous_formats, config=config)
>>> parsed_datetime.tsformat()
'2003-02-02T04:05:06Z'You can just pass the datetime_raw_str to parse() and it will parse it if there is no ambiguity.
Examples:
| Raw Datetime | isoformat result |
Note |
|---|---|---|
| 2021-12-13T12:12:12 America/Chicago | 2021-12-13T12:12:12-06:00 | Parsed as YYYY-MM-DD |
| 2021-13-12T12:12:12 America/Chicago | 2021-12-13T12:12:12-06:00 | Parsed as YYYY-DD-MM as date doesn't match YYYY-MM-DD |
| 27-12-2002 11:12:12 PM America/Chicago | 2002-12-27T23:12:12-06:00 | |
| 12:12:12Z 27-12-2002 | 2002-12-27T12:12:12+00:00 | |
| 2021-13-12T12:12:12Z | 2021-12-13T12:12:12+00:00 | |
| 12-12-2002T12:12:12.435324 America/Chicago | 2002-12-12T12:12:12.435324-06:00 | |
| 41-12-21 04:03:00 | 2041-12-21T04:03:00 | We can clearly tell year=41, day=21 and month=12 |
| May 26th 2013 12:12:12.0001 AM Asia/Kolkata | 2013-05-26T00:12:12.0001+05:30 | |
| Sunday, May 26th 2013 13:12:12 | 2013-05-26T13:12:12 | |
| Sunday, May 26th 2013 12:12:12 AM Asia/Kolkata | 2013-05-26T00:12:12+05:30 | |
| Sunday, May 26th, 2013 12:12:12 AM Asia/Kolkata | 2013-05-26T00:12:12+05:30 | |
| Sunday, May 26 2013 12:12:12 AM Asia/Kolkata | 2013-05-26T00:12:12+05:30 | |
| Sunday, May 26 2013 12:12:12.5677 AM Asia/Kolkata | 2013-05-26T00:12:12.5677+05:30 | |
| Sunday, May 26th 2013 12:12:12.5677 AM Asia/Kolkata | 2013-05-26T00:12:12.5677+05:30 |
If formats doesn't contain a match or is not passed, parse will use regex to parse the raw datetime string.
The regex parsing allows to capture digits of short formatted dates. The captured digits can all be two digits or one of them could be 4 digits long representing year.
Sometimes, it is hard to infer day, month and year from the parsed digits and this leads to ambiguity during parsing.
Following are some examples of ambiguous cases
| Cases | Reason for ambiguity | Note |
|---|---|---|
| 21-12-2T13:14:16 | Ambiguity: fits in more than one of the format: ('MM-DD-YY', 'YY-MM-DD', 'DD-MM-YY') | |
| 21-23-2020T12:13:14 | Year = 2020, but can't decide day and month between: ('21', '23') | |
| 2021-23-13T01:23:43 | Year = 2021, but can't decide day and month between: ('23', '13') | |
| 27-12-2002 12:12:12 CDT | OffsetNotKnownError: Offset value not known for 'CDT' | Can be fixed by passing DatetimeConfig.tz_dict |
| 27-12-2002 13:12:12 AM America/Chicago | InvalidTimeError: Hour is 13 but meridiem is AM | |
| 2021-10-31T02:45:00 Europe/Rome | AmbiguousFoldError: DatetimeConfig.fold must not be None to parse datetime without ambiguity. | Can be fixed by passing DatetimeConfig.fold |
| Oct 31st 2021 02:45:00.5677 AM Europe/Rome | AmbiguousFoldError: DatetimeConfig.fold must not be None to parse datetime without ambiguity. | Can be fixed by passing DatetimeConfig.fold |
from task_script_utils.datetime_parser import parse
datetime_formats = [
"DD-MM-YY HH:mm:ss z",
]Case 1: When raw datetime string matches with one of the format in datetime_formats
result = parse("21-12-20 12:30:20 Asia/Kolkata", formats=datetime_formats)
result.tsformat() # 2020-12-21T07:00:20Z
result.isoformat() # 2020-12-21T12:30:20+05:30Case 2: When raw datetime string doesn't match with one of the format in datetime_formats but can be parsed without any ambiguity. In this case, it can be inferred that year is 2020, day is 21 and hence month is 12. This is just an example and there could be multiple cases that may lead to ambiguity or invalid datetime. These are discussed in later sections of this documents
result = parse("21-12-2020 12:30:20 PM America/Chicago", formats=datetime_formats)
result.isoformat() # 2020-12-21T12:30:20-06:00
result.tsformat() # 2020-12-21T18:30:20ZCase 3: When raw datetime string doesn't match with one of the format in datetime_formats and is ambiguous. This is just an example and there could be multiple cases that may lead to ambiguity or invalid datetime. These are discussed in later sections of this documents
result = parse("21-12-20 12:30:20 PM America/Chicago", formats=datetime_formats)
'''
Traceback (most recent call last):
...
task_script_utils.datetime_parser.parser_exceptions.AmbiguousDateError:
Ambiguous date:21-12-20, possible formats: ('YY-MM-DD', 'DD-MM-YY')
'''| Token | Output | |
|---|---|---|
| Year | YYYY | 2000, 2001, 2002 ... 2012, 2013 |
| YY | 00, 01, 02 ... 12, 13 | |
| Y | 2000, 2001, 2002 ... 2012, 2013 | |
| Quarter | Q | 1 2 3 4 |
| Qo | 1st 2nd 3rd 4th | |
| Month | MMMM | January, February, March ... |
| MMM | Jan, Feb, Mar ... | |
| MM | 01, 02, 03 ... 11, 12 | |
| M | 1, 2, 3 ... 11, 12 | |
| Mo | 1st 2nd ... 11th 12th | |
| Day of Year | DDDD | 001, 002, 003 ... 364, 365 |
| DDD | 1, 2, 3 ... 4, 5 | |
| Day of Month | DD | 01, 02, 03 ... 30, 31 |
| D | 1, 2, 3 ... 30, 31 | |
| Do | 1st, 2nd, 3rd ... 30th, 31st | |
| Day of Week | dddd | Monday, Tuesday, Wednesday ... |
| ddd | Mon, Tue, Wed ... | |
| dd | Mo, Tu, We ... | |
| d | 0, 1, 2 ... 6 | |
| Days of ISO Week | E | 1, 2, 3 ... 7 |
| Hour | HH | 00, 01, 02 ... 23, 24 |
| H | 0, 1, 2 ... 23, 24 | |
| hh | 01, 02, 03 ... 11, 12 | |
| h | 1, 2, 3 ... 11, 12 | |
| Minute | mm | 00, 01, 02 ... 58, 59 |
| m | 0, 1, 2 ... 58, 59 | |
| Second | ss | 00, 01, 02 ... 58, 59 |
| s | 0, 1, 2 ... 58, 59 | |
| Fractional Second | SSSSSS | All fractional digits |
| AM / PM | A | AM, PM |
| Timezone | Z | -07:00, -06:00 ... +06:00, +07:00 |
| ZZ | -0700, -0600 ... +0600, +0700 | |
| z | Asia/Baku, Europe/Warsaw, GMT ... | |
| zz | EST CST ... MST PST | |
| Seconds timestamp | X | 1381685817, 1234567890.123 |
| Milliseconds timestamp | x | 1234567890123 |
Note: If zz token is used in format string, passing tz_dict is a must.
You can use SSSSSS as a token to parse any number on digits as fractional seconds.
TSDatetime object returned by parse() helps maintaining the precision of fractional seconds.
For example, in the examples below result.isoformat() and result.tsformat() maintains the number of digits in fractional seconds.
This is different from python's datetime object, which only allows 6 digits for microseconds.
This is visible as the result of result.datetime.isoformat(), where result.datetime property return pythonic datetime object
from task_script_utils.datetime_parser import parse
datetime_formats = [
"DD-MM-YY HH:mm:ss.SSSSSS z",
]
# Example 1
result = parse("2021-12-13T13:00:12.19368293274 Asia/Kolkata")
result.tsformat() # 2021-12-13T07:30:12.19368293274Z
result.isoformat() # 2021-12-13T13:00:12.19368293274+05:30
result.datetime.isoformat() # 2021-12-13T13:00:12.193682+05:30
# Example 2
result = parse("2021-12-13T13:00:12.1 Asia/Kolkata")
result.isoformat() # 2021-12-13T13:00:12.1+05:30
result.tsformat() # 2021-12-13T07:30:12.1Z
result.datetime.isoformat() # 2021-12-13T13:00:12.100000+05:30The only way to parse timestamps (if parse-able) is by passing DatetimeConfig object with tz_dict.
DatetimeConfig.tz_dict is a dict mapping abbreviated_tz to offset value. Defaults to empty dict.
from task_script_utils.datetime_parser import (
parse,
DatetimeConfig
)
sample_tz_dict = {
"IST": "+05:30",
"EST": "-05:00",
"CST": "-06:00",
}
dt_config = DatetimeConfig(
tz_dict=sample_tz_dict
)
result = parse("2021-12-25T00:00:00 IST", config=dt_config)
result.isoformat() # 2021-12-25T00:00:00+05:30
result.tsformat() # 2021-12-24T18:30:00ZAn exception will be raised if DatetimeConfig.tz_dict doesn't contain the abbreviated tz
result = parse("2021-12-12 14:15:16 BRST", config=dt_config)
'''
Traceback (most recent call last):
...
task_script_utils.datetime_parser.parser_exceptions.OffsetNotKnownError: Offset value not known for 'BRST'
'''"Z" can be used as token for capturing abbreviated timezones when using datetime_formats to parse the raw datetime string. Even in this case, for successful parsing, DatetimeConfig.tz_dict is required
dt_formats = [
"YYYY-MM-DD HH:mm:ss Z"
]
result = parse("2021-12-12 14:15:16 CST", formats=dt_formats, config=dt_config)
result.isoformat() # 2021-12-12T14:15:16-06:00An exception will be raised if DatetimeConfig.tz_dict doesn't contain the abbreviated tz.
After parser has extracted date parts using regex, DatetimeConfig if passed, will be used to parse short date with the help of flags like year_first: Optional[bool] and day_first: Optional[bool].
day_first: Optional[bool]:True/False. Defaults toNone- Whether to interpret the first value in an ambiguous 3-integer date (e.g. 01/05/09) as the day (True) or month (False).
year_first: Optional[bool]:True/False. Defaults toNone- Whether to interpret the first value in an ambiguous 3-integer date (e.g. 01/05/09) as the year.
Following table specifies the possible datetime formats depending on the values of day_first and year_first.
| year_first | day_first | possible formats |
|---|---|---|
| True | True | year-day-month |
| True | False | year-month-day |
| True | None | year-month-day year-day-month |
| False | True | day-month-year |
| False | False | month-day-year |
| False | None | month-day-year day-month-year |
| None | True | day-month-year |
| None | False | year-month-day month-day-year |
| None | None | month-day-year day-month-year year-month-day |
When the year has four digits, then whether year_first is true or false, is decided by regex parsing done by DatetimeInfo.
In this case the value of DatetimeConfig.year_first is ignored. Only DatetimeInfo.day_first is taken into account.
Consider following examples:
datetime_raw_str |
year_first |
day_first |
result (year-month-day) |
|---|---|---|---|
| 2021/02/3 04:03:00 | False | True | 2021-03-02T04:03:00 |
| 3/02/2021 04:03:00 | True | True | 2021-02-03T04:03:00 |
When year_first is true and day_first is None , year-month-day takes precedence over year-day-month. eg 20-12-11 satisfies both year-month-day and year-day-month but is will be parsed as day=11, month=12, year=2020
This can happen in following two cases:
- When
DatetimeConfig.year_first=TrueandDatetimeConfig.day_first=None - When
DatetimeConfigis not provided but regex parsing infers 4 digit year in the beginning. eg 2020-12-11.
For example:
datetime_raw_str |
year_first |
day_first |
result (year-month-day) |
|---|---|---|---|
| 21-11-12T00:00:00 | True | None | 2021-11-12T00:00:00 |
| 2021-11-12T00:00:00 | None | True | 2021-02-03T04:03:00 |
For every other cell in the possible formats table with multiple formats, only one of the formats must satisfy the raw datetime string. If None or more that one format satisfies the datetime string, it is still an ambiguous case.
Consider following examples on how year_first and day_first impacts the parsing result:
datetime_raw_str |
year_first |
day_first |
result (year-month-day) |
|---|---|---|---|
| 2021/11/07 04:03:00 | None | None | 2021-11-07T04:03:00 |
| 2021/11/07 04:03:00 | None | True | 2021-07-11T04:03:00 |
| 11\12\2021 04:03:00 | None | True | 2021-12-11T04:03:00 |
| 13/02/03 04:03:00 | None | True | 2003-02-13T04:03:00 |
| 01/02/03 04:03:00 | None | True | 2003-02-01T04:03:00 |
| 12/13/03 04:03:00 | None | False | 2003-12-13T04:03:00 |
| 13-02-03 04:03:00 | None | False | 2013-02-03T04:03:00 |
| 2021.11.7 04:03:00 | None | False | 2021-11-07T04:03:00 |
| 2021.11.07 04:03:00.00045000 | None | False | 2021-11-07T04:03:00.00045000 |
| 01/02/03 04:03:00.0 | True | None | 2001-02-03T04:03:00.0 |
| 13/02/03 04:03:00 | True | None | 2013-02-03T04:03:00 |
| 13/2/03 04:03:00 | False | None | 2003-02-13T04:03:00 |
| 01/02/03 04:03:00 | True | True | 2001-03-02T04:03:00 |
| 13/02/03 04:03:00 | True | True | 2013-03-02T04:03:00 |
| 1/02/03 04:03:00 | True | False | 2001-02-03T04:03:00 |
| 13/02/03 04:03:00 +05:30 | True | False | 2013-02-03T04:03:00+05:30 |
| 01/02/03T04:30:00 | False | True | 2003-02-01T04:30:00 |
| 01/02/3T04:30:00 | False | True | 2003-02-01T04:30:00 |
| 1/2/3T4:30:00 | False | True | 2003-02-01T04:30:00 |
| 1/2/3T4:3:00 | False | True | 2003-02-01T04:03:00 |
| 01/02/13T04:03:00 | False | True | 2013-02-01T04:03:00 |
| 01/02/03 04:03:00 | False | True | 2003-02-01T04:03:00 |
| 13/2/03 04:03:00.43500 | False | True | 2003-02-13T04:03:00.43500 |
| 01/02/03 04:03:00 | False | False | 2003-01-02T04:03:00 |
Defining year_first and day_first in DatetimeConfig, doesn't always guarantee a valid parse.
Consider the examples below:
datetime_raw_str |
year_first |
day_first |
possible formats |
error |
|---|---|---|---|---|
| 13/02/3 04:03:00 | False | False | MDY | InvalidDateError: month must be in 1..12 |
| 2021/32/07 04:03:00 | None | True | DMY | InvalidDateError: day is out of range for month, date=('2021', '32', '07') |
| 2021/11/14 04:03:00 | None | True | DMY | InvalidDateError: month must be in 1..12, |
| 01/02/03 04:03:00 | None | False | YMD/MDY | AmbiguousDateError: Ambiguous date:01-02-03, possible formats: ('MM-DD-YY', 'YY-MM-DD') |
| 01/02/03 04:03:00 | False | None | MDY/DMY | AmbiguousDateError: Can't decide day and month between: ('01', '02') |
| 01/15/11 04:03:00 | True | False | YMD | InvalidDateError: month must be in 1..12 |
parse() return a TSDatetime object.
It has following methods and properties:
isoformat(): return datetime string in ISO formattsformat(): returns datetime string in Tetrascience's ISO8601 formatdatetime: return pythondatetimeobject.change_fold(): change the fold value
TSDatetime allows us to maintain the precision of fractional seconds. Using TSDatetime.datetime return the python's datetime object but we might loose the sub-seconds precision.
DatetimeConfig is used to provide complementary information that helps to parse datetime
from task_script_utils.datetime_parser import DatetimeConfigA DatetimeConfig object has following attributes:
day_first: Optional[bool]:True/False. Defaults toNone- Whether to interpret the first value in an ambiguous 3-integer date (e.g. 01/05/09) as the day (True) or month (False).
year_first: Optional[bool]:True/False. Defaults toNone- Whether to interpret the first value in an ambiguous 3-integer date (e.g. 01/05/09) as the year.
- When the year has four digits, then whether
year_firstistrueorfalse, is decided by regex parsing done byDatetimeInfoclass.
tz_dict: Optional[dict]: adictmapping abbreviated_tz to offset value. Defaults toempty dict.
sample_tz_dict = {
"IST": "+05:30",
"EST": "-05:00",
"CST": "-06:00",
}
# Inbuilt tz_dict
from task_script_utils.datetime_parser.tz_dicts import USA
print(USA)
{
"EDT": "-04:00",
"EST": "-05:00",
"CDT": "-05:00",
"CST": "-06:00",
"MDT": "-06:00",
"MST": "-07:00",
"PDT": "-07:00",
"PST": "-08:00",
}fold: Optional[int]:0,1. Defaults toNone.- It is required during the 2 hour window when clocks are set back in a timezone which keeps track of daylight savings (such as IANA timezones like
Europe/London). - The allowed values for the fold attribute will be 0 and 1 with 0 corresponding to the earlier and 1 to the later of the two possible readings of an ambiguous local time.
- If fold is
None, Parser will check iffoldis needed or not to parse the time with no ambiguity. AmbiguousFoldErrorwill be raised iffoldis needed.
- It is required during the 2 hour window when clocks are set back in a timezone which keeps track of daylight savings (such as IANA timezones like
- It is not possible to parse just dates or just times alone.
e.g.
parse('2021-12-08')orparse('12:00:00')will raiseInvalidDateError
- Add
require_unambiguous_formatstoDatetimeConfigto enable/disable checking of ambiguous datetime formats passed to parsing functions - Add logic to
parser._parse_with_formatsto be used whenDatetimeConfig.require_unambiguous_formatsis set toTrueAmbiguousDatetimeFormatsErroris raised if mutually ambiguous formats are detected and differing datetimes are parsed
- Add parameter typing throughout repository
- Add public-facing
parse_with_formatsfunction that only attempts to parse datetime strings using the provided formats - Fix exception message when trying to parse 2-digit dates without using a format to only print out the possible formats that parse to a valid datetime
- Move
utils.pyintoutils/folder and split logic into two files to avoid cyclic dependency

