Creating Dummy Dataset [ SAS/Python ]



Creating dummy datasets can come in handy when we require a small data to test logic of our code. Due to small size, these dummy datasets provide us with the advantage of running a complex logic on a smaller number of records, resulting in faster execution and quicker/easier testing of output. 

1.1: SAS: 
In SAS, the easiest way to create dummy data within the Programming editor is by using datalines. , e.g.

data Math;
INPUT a b ;
DATALINES; 
75	0.36
-42	0.50
17	-2.57
10	3.54
-1	-1.99
; 
run; 

The output window shows the data as: 
Obs   a     b
  1  75  0.36
  2 -42  0.50
  3  17 -2.57
  4  10  3.54
  5  -1 -1.99
 

1.2 PYTHON: 

To create this dummy data in PYTHON, we can use pandas library, and input the dummy data as a dictionary with keys and values for the columns a and b, as shown below.

import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[75, -42, 17, 10, -1],
                  'b':[0.36, 0.5, -2.57, 3.54, -1.99]})
df
 
The dataframe df here shows the same data output, but with an additional Index column.
We will discuss indexes in a future post. 
    a     b
0  75  0.36
1 -42  0.50
2  17 -2.57
3  10  3.54
4  -1 -1.99

For datasets with String values, the datalines statement in SAS would be different, as the input variables have string values. An example of this would be:

data Data_String;
input Name $15. Department $20. Rank $15. ;
DATALINES; 
Katrina Kaif	Public Relations	Director
Danish Sait    	HR					Director
Kabir Khan		IT					Director
Cindee Law		Events				Consultant
Sidney Poirot	Marketing			Editor
Mike Morgan     HR					Editor
;
run ;

To do the same in Python, I am going to use dictionaries with a list of String values.
import pandas as pd
df = pd.DataFrame({'Name':['Katrina Kaif', 'Danish Sait', 'Kabir Khan ', 'Cindee Law', 'Sidney Poirot', 'Mike Morgan'],
                   'Department':['Public Relations', 'HR', 'IT', 'Events', 'Marketing', 'HR'],
                   'Rank':['Director', 'Director', 'Director', 'Consultant', 'Editor', 'Editor']},
                  columns=['Name', 'Department', 'Rank'])
df

These dummy datasets will be frequently referred to in our data manipulation exercises.
Please refer to the ones mentioned below in the suggestions:

1: Data Manipulation - Math Functions - Part 1
2: Data Manipulation - Math Functions - Part 2
3: Data Manipulation - String Functions - Part 1
4: Data Manipulation - String Functions - Part 2

Further, you can find all my code and tutorials on my github page: github.com/dataisdank

Cheerio!

Post a Comment

0 Comments

Ad Code

Responsive Advertisement