Limitations of Pandas (0.18.0) HDFStore
I went from CSV to HDF storage, because I was able to get much faster read/write times and hence much better feedback loop when testing my analysis.
Here is a very short list of current limitations of both fixed
and table
formats I found very frustrating.
Fixed format:
- Can't store category format
- Can't use usecols or specific ranges of query (like from row 100th to 200th)
Table format:
- Can store category format, have usecols and query of specific ranges
- Supports only non-wide dataframes. Snippet below will end with very uninformative error:
import pandas as pd
df = pd.DataFrame(columns=['Some Teribly Long String With Many Characters' + str(i) for i in range(10000)])
df.loc[0] = 3
df.to_hdf('test.hdf', 'main', format='table')