# Reducing dtype size of a Numpy/Pandas array

I run into memory problem when processing very large dataframe.

The *problem* is that Pandas use `float64`

and `int64`

numpy dtypes by default even in cases when it is totally unnecessary (you have e.g. only binary values). Furthermore, it is not even possible to change this default behaviour.

Hence, I wrote a function which finds the smallest possible dtype for a specific array.

```
def safely_reduce_dtype(ser): # pandas.Series or numpy.array
orig_dtype = "".join([x for x in ser.dtype.name if x.isalpha()]) # float/int
mx = 1
for val in ser.values:
new_itemsize = np.min_scalar_type(val).itemsize
if mx < new_itemsize:
mx = new_itemsize
new_dtype = orig_dtype + str(mx * 8)
return new_dtype # or converts the pandas.Series by ser.astype(new_dtype)
```

So, e.g.:

```
>>> import pandas
>>> serie = pd.Series([1,0,1,0], dtype='int32')
>>> safely_reduce_dtype(serie)
dtype('int8')
>>> float_serie = pd.Series([1,0,1,0])
>>> safely_reduce_dtype(float_serie)
dtype('float8') # from float64
```

Using this you can reduce the size of your dataframe significantly up to factor 4.

## Update:

There is `pd.to_numeric(series, downcast='float')`

in `Pandas 0.19`

. The above was written before it was out and can be used in old versions.