in data-science python ~ read.

Limitations of Pandas (0.18.0) HDFStore

I went from CSV to HDF storage, because I was able to get much faster read/write times and hence much better feedback loop when testing my analysis.

Here is a very short list of current limitations of both fixed and table formats I found very frustrating.

Fixed format:

  1. Can't store category format
  2. Can't use usecols or specific ranges of query (like from row 100th to 200th)

Table format:

  1. Can store category format, have usecols and query of specific ranges
  2. Supports only non-wide dataframes. Snippet below will end with very uninformative error:
import pandas as pd  
df = pd.DataFrame(columns=['Some Teribly Long String With Many Characters' + str(i) for i in range(10000)])  
df.loc[0] = 3  
df.to_hdf('test.hdf', 'main', format='table')