-
-
Notifications
You must be signed in to change notification settings - Fork 50
Expand file tree
/
Copy path50.1-Appendix.Rmd
More file actions
356 lines (234 loc) · 9.38 KB
/
50.1-Appendix.Rmd
File metadata and controls
356 lines (234 loc) · 9.38 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
# Appendix
## Git
This appendix provides a concise reference to essential Git concepts and commands, tailored for data analysts and researchers managing code and collaboration. For extended learning, explore the following resources:
- [Git Cheat Sheet (PDF)](https://training.github.com/downloads/github-git-cheat-sheet.pdf)
- [Git Cheat Sheets in Other Languages](https://training.github.com/)
- [Interactive Git Tutorial](http://try.github.io/)
- [Visual Git Cheat Sheet](http://ndpsoftware.com/git-cheatsheet.html#loc=remote_repo;)
- [Happy Git with R (for R Users)](https://happygitwithr.com/)
---
### Basic Setup
Configure your Git environment using the `git config` command:
- Set your name and email (used in commits):
```bash
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"
```
* Set your preferred text editor (e.g., for writing commit messages):
```bash
git config --global core.editor "code --wait" # VS Code
```
---
### Creating a Repository
To create a new Git repository in your project directory:
```bash
git init
```
This creates a `.git` directory where Git stores all version control information.
---
### Tracking Changes
Git tracks changes through a three-tier structure:
* **Working Directory**: your local folder with files.
* **Staging Area**: where you prepare changes before committing.
* **Local Repository**: stores committed snapshots of your code.
Common commands:
* Check status:
```bash
git status
```
* Add files to the staging area:
```bash
git add filename
git add . # Add all changes
```
* Commit staged changes:
```bash
git commit -m "A brief message describing the change"
```
---
### Viewing History and Changes
* Show changes not yet staged:
```bash
git diff
```
* Show committed changes:
```bash
git log
```
* Restore previous versions of files:
```bash
git checkout HEAD filename # Restore last committed version
git checkout <commit-id> filename # Restore from specific commit
```
---
### Ignoring Files
To prevent certain files from being tracked by Git, create a `.gitignore` file. For example:
```bash
# .gitignore
*.dat
results/
```
* View contents using:
```bash
cat .gitignore
```
---
### Remote Repositories
Git supports linking local and remote repositories (e.g., GitHub):
* Add a remote:
```bash
git remote add origin https://github.com/yourname/repo.git
```
* Push changes to remote:
```bash
git push origin main # or 'master' depending on default branch
```
* Pull changes from remote:
```bash
git pull origin main
```
---
### Collaboration
* Clone a remote repository:
```bash
git clone https://github.com/username/repository.git
```
This creates a local copy and sets up a remote named `origin`.
---
### Branching and Merging
* Create and switch to a new branch:
```bash
git checkout -b new-branch-name
```
* Switch back to main branch:
```bash
git checkout main
```
* Merge another branch into the current one:
```bash
git merge feature-branch
```
---
### Handling Conflicts
Merge conflicts occur when multiple changes affect the same lines of a file. Git will:
* Mark the conflict in the file.
* Require manual resolution before committing.
Always review and test code after resolving conflicts.
---
### Licensing
Understanding software licensing is essential in open-source collaboration:
* **GPL (General Public License)**: Requires derivative software to also be GPL-licensed.
* **Creative Commons**: Offers flexible combinations of attribution, sharing, and commercial use restrictions.
Choose licenses aligned with your intended use and contributions.
---
### Citing Repositories
To guide citation practices:
* Include a `CITATION` file in your repository.
* Provide preferred citation formats (e.g., BibTeX, DOI).
This helps others acknowledge your work in academic or professional settings.
---
### Hosting and Legal Considerations
Whether hosted on GitHub, GitLab, or institutional servers:
* Respect intellectual property.
* Avoid storing sensitive or personal data in version control.
* Follow organizational or institutional data security policies.
## Short-cut
These are shortcuts that you probably you remember when working with R. Even though it might take a bit of time to learn and use them as your second nature, but they will save you a lot of time.\
Just like learning another language, the more you speak and practice it, the more comfortable you are speaking it.\
| function | short-cut |
|--------------------------------------------------|---------------------------|
| navigate folders in console | `" " + tab` |
| pull up short-cut cheat sheet | `ctrl + shift + k` |
| go to file/function (everything in your project) | `ctrl + .` |
| search everything | `cmd + shift + f` |
| navigate between tabs | `Crtl + shift + .` |
| type function faster | `snip + shift + tab` |
| type faster | `use tab for fuzzy match` |
| `cmd + up` | |
| `ctrl + .` | |
Sometimes you can't stage a folder because it's too large. In such case, use `Terminal` pane in Rstudio then type `git add -A` to stage all changes then commit and push like usual.
## Function short-cut
apply one function to your data to create a new variable: `mutate(mod=map(data,function))`\
instead of using `i in 1:length(object)`: `for (i in seq_along(object))`\
apply multiple function: `map_dbl`\
apply multiple function to multiple variables:`map2`\
`autoplot(data)` plot times series data\
`mod_tidy = linear(reg) %>% set_engine('lm') %>% fit(price ~ ., data=data)` fit lm model. It could also fit other models (stan, spark, glmnet, keras)
- Sometimes, data-masking will not be able to recognize whether you're calling from environment or data variables. To bypass this, we use `.data$variable` or `.env$variable`. For example `data %>% mutate(x=.env$variable/.data$variable`\
- Problems with data-masking:\
+ Unexpected masking by data-var: Use `.data` and `.env` to disambiguate\
+ Data-var cant get through:\
+ Tunnel data-var with {{}} + Subset `.data` with [[]]
- Passing Data-variables through arguments
```{r eval=FALSE}
library("dplyr")
mean_by <- function(data,by,var){
data %>%
group_by({{{by}}}) %>%
summarise("{{var}}":=mean({{var}})) # new name for each var will be created by tunnel data-var inside strings
}
mean_by <- function(data,by,var){
data %>%
group_by({{{by}}}) %>%
summarise("{var}":=mean({{var}})) # use single {} to glue the string, but hard to reuse code in functions
}
```
- Trouble with selection:\
```{r eval=FALSE}
library("purrr")
name <- c("mass","height")
starwars %>% select(name) # Data-var. Here you are referring to variable named "name"
starwars %>% select(all_of((name))) # use all_of() to disambiguate when
averages <- function(data,vars){ # take character vectors with all_of()
data %>%
select(all_of(vars)) %>%
map_dbl(mean,na.rm=TRUE)
}
x = c("Sepal.Length","Petal.Length")
iris %>% averages(x)
# Another way
averages <- function(data,vars){ # Tunnel selectiosn with {{}}
data %>%
select({{vars}}) %>%
map_dbl(mean,na.rm=TRUE)
}
x = c("Sepal.Length","Petal.Length")
iris %>% averages(x)
```
## Citation
To cite the R packages used during this session, the following code prints BibTeX-formatted citations:
```{r, eval = FALSE}
# List all non-base packages loaded in the session
packages <- ls(sessionInfo()$loadedOnly)
# Print BibTeX citations for each package
for (pkg in packages) {
print(toBibtex(citation(pkg)))
}
```
You may wish to redirect this output to a `.bib` file for integration with LaTeX or R Markdown documents using `writeLines()`.
---
## Install All Necessary Packages on Your Local Machine
To replicate the environment used in this book or session on another machine, you can follow these steps.
### Step 1: Export Installed Packages from Your Current Session
```{r}
# Get all installed packages
installed <- as.data.frame(installed.packages())
# Preview the installed packages
head(installed)
# Export the list to a CSV file
write.csv(installed$Package, file = file.path(getwd(), "installed.csv"), row.names = FALSE)
```
### Step 2: Install Packages on a New Machine
Once you have transferred the `installed.csv` file to the new machine, run the following code to install any missing packages.
```{r, eval = FALSE}
# Read the list of required packages
required <- read.csv("installed.csv", stringsAsFactors = FALSE)$Package
# Get the list of already installed packages on the current machine
current <- installed.packages()[, "Package"]
# Identify packages that are not yet installed
missing <- setdiff(required, current)
# Install the missing packages
install.packages(missing)
```
> ⚠️ Note: This approach assumes that all packages are available from CRAN. For packages from GitHub or Bioconductor, use `devtools::install_github()` or `BiocManager::install()` as appropriate.
This approach ensures a reproducible computational environment, which is essential for robust data analysis and collaboration.